AWS Unveils AI Factories at re:Invent 2025

At the AWS re:Invent 2025 conference in Las Vegas, Amazon Web Services made a groundbreaking announcement that could fundamentally reshape how enterprises deploy AI infrastructure. In a strategic partnership with Nvidia, AWS introduced “AI Factories” – a revolutionary new offering that allows corporations and governments to run AWS AI systems within their own data centers, marking a significant shift in cloud computing strategy.

The announcement came during the opening keynote on December 2nd, delivered by AWS CEO Adam Selipsky to an audience of over 50,000 attendees and millions watching online. The AI Factories represent AWS’s response to growing enterprise demand for hybrid AI infrastructure, particularly from organizations in heavily regulated industries like finance, healthcare, and government that face strict data sovereignty requirements.

What Are AI Factories?

AI Factories are turnkey AI infrastructure solutions that combine AWS’s cloud expertise with on-premises deployment. Unlike traditional cloud services that require data to leave the organization’s physical premises, AI Factories bring AWS’s powerful AI capabilities directly into customer data centers.

Each AI Factory includes:

  • Compute Infrastructure: Choice of Nvidia H100/H200 GPUs or Amazon’s proprietary Trainium3 AI chips
  • Software Stack: Pre-configured AWS AI services including SageMaker, Bedrock, and custom model training tools
  • Management Layer: Cloud-based management console for monitoring and scaling
  • Support Services: 24/7 AWS technical support and on-site deployment assistance
  • Networking: Secure connection to AWS cloud for hybrid workloads

The Nvidia Partnership

The collaboration with Nvidia is particularly significant. Jensen Huang, Nvidia’s CEO, joined Selipsky on stage to announce that Nvidia would provide not just GPUs, but also contribute to the software layer with their CUDA and NIM (Nvidia Inference Microservices) technologies.

“This partnership brings together the best of both worlds,” Huang explained. “AWS’s unmatched experience in managing large-scale infrastructure combined with Nvidia’s leadership in AI acceleration creates an offering that no single company could deliver alone.”

Nvidia will offer three GPU configurations for AI Factories:

  • Entry Level: 8x H100 GPUs suitable for small to medium AI workloads
  • Standard: 32x H200 GPUs for enterprise-scale deployments
  • Premium: 128x H200 GPUs for massive AI training and inference

Amazon Trainium3: The Alternative

For organizations seeking cost optimization, AWS is offering its newest homegrown AI chip, Trainium3, as an alternative to Nvidia GPUs. According to AWS, Trainium3 delivers up to 4x better price-performance compared to Nvidia’s H100 chips for certain workloads, particularly large language model training.

Trainium3 specifications include:

  • 384 GB HBM3e memory per chip
  • 900 GB/s memory bandwidth
  • Optimized for transformer-based AI models
  • Custom instruction set for AI operations
  • 40% lower power consumption than comparable GPU solutions

Dr. Nafea Bshara, Vice President of AWS Silicon, emphasized that Trainium3 is purpose-built for AI: “While GPUs are general-purpose accelerators, Trainium3 is designed from the ground up exclusively for AI workloads. This specialization allows us to achieve better efficiency and performance for specific use cases.”

Why On-Premises AI Now?

The move to offer on-premises AI infrastructure might seem counterintuitive for a cloud provider, but AWS executives argue it’s a natural evolution. Three key factors are driving demand:

1. Data Sovereignty and Compliance

Financial institutions, healthcare providers, and government agencies face strict regulations about where data can be processed. Germany’s BaFin financial regulator, for example, requires that certain sensitive financial data never leave German soil. AI Factories allow these organizations to leverage advanced AI while maintaining full data control.

2. Latency Requirements

Some AI applications – particularly in manufacturing, autonomous vehicles, and real-time trading – require microsecond-level latency that cloud connections cannot guarantee. On-premises AI eliminates network latency entirely.

3. Cost at Scale

For organizations running massive, continuous AI workloads, on-premises infrastructure can be more cost-effective than paying for cloud compute by the hour. AWS estimates that organizations running AI workloads 24/7 could save 30-40% over three years with AI Factories compared to cloud-only solutions.

Industry Reaction

The announcement has been met with enthusiasm from enterprise customers. Deutsche Bank, one of the early adopters in the private beta program, shared their experience.

“AI Factories solved a critical problem for us,” said Deutsche Bank’s Chief Technology Officer, Bernd Leukert. “We can now run advanced AI models on customer financial data without it ever touching the public internet. This was simply impossible before.”

Analysts are also bullish. Gartner research director Chirag Dekate commented: “This is AWS acknowledging that the future of AI infrastructure is hybrid. Organizations want the innovation speed of the cloud with the control of on-premises infrastructure. AI Factories deliver both.”

Competitive Landscape

AWS isn’t alone in this space. Microsoft has offered Azure Stack for hybrid deployments since 2017, and Google Cloud has Anthos. However, neither competitor currently offers a turnkey AI-specific solution like AI Factories.

Microsoft responded to the announcement within hours, with Azure CTO Mark Russinovich tweeting: “Interesting move by AWS. Azure customers have been running on-prem AI on Azure Stack for years. Glad to see competition catching up.”

Google Cloud declined to comment officially but sources familiar with the company’s plans suggest a similar offering may be announced in early 2026.

Pricing and Availability

AWS has not yet disclosed detailed pricing for AI Factories, stating only that pricing will be “customized based on configuration and scale.” Industry analysts estimate that entry-level configurations will start around $2-3 million for a three-year commitment, with premium configurations potentially exceeding $50 million.

The service will launch in Q2 2026, with initial availability limited to customers who participated in the private beta. General availability is expected in Q4 2026. Organizations interested in AI Factories can apply through a special AWS portal opening in January 2026.

Technical Requirements

Organizations considering AI Factories will need to meet several infrastructure requirements:

  • Power: Minimum 500 kW dedicated power infrastructure (up to 5 MW for largest configurations)
  • Cooling: Advanced cooling systems capable of dissipating 200 kW per rack
  • Space: Climate-controlled data center space (minimum 500 square feet)
  • Connectivity: Minimum 10 Gbps dedicated connection to AWS region
  • Personnel: On-site technical staff for basic maintenance (AWS handles complex issues remotely)

Looking Forward

The AI Factories announcement signals a broader trend in enterprise computing: the recognition that no single deployment model – cloud, on-premises, or edge – is optimal for all workloads. As AI becomes more central to business operations, organizations need flexible infrastructure options that balance innovation, control, compliance, and cost.

AWS plans to continue expanding AI Factories capabilities, with roadmap items including:

  • Integration with AWS Outposts for smaller deployments
  • Support for specialized AI chips from other vendors
  • Pre-trained industry-specific models
  • Automated model training and deployment pipelines
  • Advanced security features including homomorphic encryption

For organizations serious about AI, AWS has thrown down the gauntlet. The question is no longer whether to adopt AI, but how to deploy it in a way that balances innovation with control. AI Factories represent AWS’s answer to that question.

Share This Article

Written by

Technology journalist and software expert, covering the latest trends in tech and digital innovation.