OpenAI unveils Jalapeño, its first custom AI inference chip built with Broadcom
The ChatGPT maker is moving down the silicon stack with a purpose-built ASIC it says beats current state-of-the-art on performance per watt.
What matters
- OpenAI announced Jalapeño, its first custom AI inference processor, developed in partnership with Broadcom and Celestica.
- The ASIC is purpose-built for LLM inference, with early testing claiming performance per watt substantially better than current state-of-the-art.
- The chip went from design to production in nine months, a timeline OpenAI says was accelerated by its own models.
- OpenAI plans gigawatt-scale deployment with data center partners and describes Jalapeño as the first in a multi-generation compute platform.
- Broadcom handled chip implementation and networking; Celestica contributed board and rack system integration.
What happened
On June 24, 2026, OpenAI announced Jalapeño, its first custom AI inference processor, developed in partnership with Broadcom and Celestica. The chip is an ASIC — an Application-Specific Integrated Circuit — designed from the ground up for running current and future large language models, rather than serving as a general-purpose GPU.
According to OpenAI's announcement, early testing shows the first-generation accelerator delivers performance per watt "substantially better than current state-of-the-art." The company also said Jalapeño went from design to production in nine months, a timeline it claims was accelerated by using its own AI models in the development process.
Broadcom handled chip implementation, high-performance networking, and scalability, while Celestica contributed board and rack system integration. Broadcom CEO Hock Tan and President Charlie Kawwas delivered the first unit to OpenAI CEO Sam Altman and President Greg Brockman.
OpenAI described Jalapeño as the first accelerator in a multi-generation compute platform it is building with Broadcom, with plans to deploy it at gigawatt scale alongside data center partners.
Why it matters
Until now, OpenAI — like most frontier labs — has relied primarily on Nvidia GPUs to train and serve its models. Designing custom inference silicon puts OpenAI in the same vertical-integration camp as Google, which has built its own TPUs for roughly a decade, and signals that frontier labs increasingly see the silicon layer as a strategic lever rather than a commodity purchase.
Inference — the act of running a trained model to generate responses — is where the bulk of operational cost lives for companies like OpenAI. A chip tuned specifically for OpenAI's model architectures and serving systems could materially reduce the cost per query, improve latency, and give the company more control over its infrastructure roadmap.
The gigawatt-scale deployment ambition and multi-generation roadmap suggest this is not a one-off experiment but a long-term platform play. If the performance-per-watt claims hold, Jalapeño could reshape the economics of serving ChatGPT and OpenAI's API at a time when inference costs are a central concern for the entire industry.
Public reaction
No Reddit or public discussion threads were captured in the available inputs at the time of writing, so community sentiment could not be assessed. Expect significant discussion to emerge as more technical details — die size, node process, memory bandwidth, benchmark comparisons — become public.
What to watch
- Whether OpenAI discloses concrete benchmarks against Nvidia's current inference GPUs or other ASICs.
- How quickly Jalapeño reaches production deployment and whether OpenAI's API users see measurable latency or pricing changes.
- The second-generation chip roadmap and whether OpenAI expands beyond inference into training silicon.
- Competitive responses from Nvidia, AMD, and other hyperscalers building their own accelerators.
Sources
Public reaction
No Reddit or public discussion threads were available in the captured inputs at the time of writing, so community sentiment could not be assessed. Discussion is likely to focus on benchmark credibility, the nine-month development timeline, and competitive implications for Nvidia once more technical details emerge.
Open questions
- How will Jalapeño's real-world inference performance compare to Nvidia's latest GPUs on OpenAI's own model workloads?
- Will the nine-month design-to-production timeline hold up to scrutiny, or does it mask earlier preparatory work?
- Does this signal a broader trend of frontier labs abandoning general-purpose GPUs entirely, or is inference-specific silicon a narrower opportunity?
What to do next
Developers
Monitor OpenAI's API changelog and developer documentation for any inference performance, latency, or pricing updates that may signal Jalapeño deployment.
Custom inference silicon typically brings throughput or efficiency gains that surface as API-level changes — faster response times, higher rate limits, or lower costs — before they are publicly benchmarked.
Founders
Reassess build-versus-buy assumptions around inference infrastructure, especially if your product depends heavily on OpenAI API costs.
If Jalapeño meaningfully reduces OpenAI's inference costs, those savings may eventually be passed to API customers, shifting unit economics for AI-native startups that rely on OpenAI's platform.
PMs
Track whether OpenAI introduces tiered inference options or performance guarantees that could map to Jalapeño-powered endpoints.
Custom silicon may enable differentiated service tiers — lower-latency or higher-throughput plans — that affect product roadmaps depending on OpenAI's API.
Investors
Evaluate the competitive implications for Nvidia and other GPU-dependent inference providers as frontier labs vertically integrate into silicon.
OpenAI joining Google and others in custom inference ASICs signals a structural trend that could compress demand growth for general-purpose AI GPUs in inference workloads, though training-side demand remains distinct.
Operators
Review current GPU-based inference capacity plans and model whether a shift toward ASIC-backed inference at major providers could alter cloud pricing or instance availability.
If hyperscalers and frontier labs increasingly run inference on custom silicon, the secondary market and cloud availability of GPU instances may shift, affecting capacity planning and cost projections.
Testing notes
Caveats
- Jalapeño is an internal infrastructure component, not a publicly available product or API endpoint.
- No benchmarks, SDKs, or developer-facing tools have been announced, so direct testing is not possible at this time.
- Any observable impact would come indirectly through changes in OpenAI API performance, pricing, or availability.
- OpenAI's performance-per-watt claim is based on early testing and has not been independently verified.