carbonforge
Montreal, QC
Compiler-level energy optimization for AI inference
AI is becoming an always-on compute layer—and a massive new source of electricity demand. As inference scales to billions of queries per day, energy is emerging as the key constraint on deployment, and costs associated with usage are rising quickly.
CarbonForge tackles the AI inference bottleneck by optimizing energy at the MLIR compiler layer. Their optimization engine plugs into existing workflows to significantly reduce joules per token—no model changes, no hardware swaps—while also improving latency.