Most organisations use models that are 10× more expensive than necessary. This framework tells you exactly which model to use — and why.
The best model is the cheapest one that meets your accuracy requirement. Benchmark before committing.
Don't standardise on one model. Standardise on your orchestration layer and swap models freely.
At 10 users, model choice costs $20/month. At 1,000 users, it costs $2,000/month. Model the economics first.
Three questions that determine your model selection.
| If your priority is... | And volume is... | Use this model tier |
|---|---|---|
| Accuracy critical + quality matters | Any | → GPT-4o or Claude Sonnet |
| High volume + accuracy ≥ 90% verified | High | → GPT-4o-mini or Claude Haiku |
| Ultra-long context (>100K tokens) | Any | → Gemini 1.5 Pro (1M context) |
| Data must not leave Australia | Any | → Azure OpenAI (AUS region) or AWS Bedrock (Sydney) |
| Maximum cost efficiency | High | → Gemini 1.5 Flash or Claude Haiku |
| On-premise / no external API | Any | → Llama 3.1 70B (self-hosted) |
| European data residency | Any | → Mistral Large (Azure EU) |
All prices per 1 million tokens. Updated Q1 2025.
| Model | Provider | Input/1M | Output/1M | Speed | Context | Privacy | Best for |
|---|---|---|---|---|---|---|---|
GPT-4o Frontier | OpenAI | $3.88 | $15.50 | Medium | 128K | Azure | Complex reasoning, multimodal, long documents |
GPT-4o-mini Efficient ✓ | OpenAI | $0.23 | $0.93 | Fast | 128K | Azure | High-volume classification, extraction, triage |
Claude 3.5 Sonnet Frontier | Anthropic | $4.65 | $23.25 | Medium | 200K | AWS Bedrock | Nuanced reasoning, long documents, report generation |
Claude 3 Haiku Efficient ✓ | Anthropic | $0.39 | $1.94 | Very Fast | 200K | AWS Bedrock | High-volume simple tasks, summarisation at scale |
Gemini 1.5 Pro Frontier | $1.94 | $7.75 | Medium | 1M | Vertex AI | Ultra-long context, multimodal, document processing | |
Gemini 1.5 Flash Efficient ✓ | $0.12 | $0.47 | Very Fast | 1M | Vertex AI | Speed-optimised, high-volume, cost-sensitive tasks | |
Llama 3.1 70B Open Source | Meta (OSS) | ~$500–5K/mo infra | Variable | 128K | Full control | On-premise, data sovereignty, no external API | |
Mistral Large Frontier | Mistral | $3.10 | $9.30 | Medium | 128K | Azure | European data residency requirements |
Actual cost calculations for common business AI use cases.
Classify 800 support emails/day into 12 categories
Classification is a proven task for smaller models. Run 500 test examples first. If accuracy ≥ 90%, deploy the cheaper model. Save $200+/year with negligible quality loss.
The highest-ROI architecture uses different models for different steps.
This model selection framework is covered in depth in Module 1 (foundations) and Module 7 (tools and infrastructure) of the Le On AI curriculum.