Model Selection Framework

Choose the Right AI Model

Most organisations use models that are 10× more expensive than necessary. This framework tells you exactly which model to use — and why.

⚡ From Module 1 & Module 7 — Le On AI Curriculum
Right-size first

The best model is the cheapest one that meets your accuracy requirement. Benchmark before committing.

Standardise orchestration

Don't standardise on one model. Standardise on your orchestration layer and swap models freely.

Volume changes everything

At 10 users, model choice costs $20/month. At 1,000 users, it costs $2,000/month. Model the economics first.

The Decision Framework

Three questions that determine your model selection.

1

What is your input type?

Text only
Emails, documents, tickets, reports
Structured data
Tables, databases, CSV, JSON
Images / visual docs
PDFs with images, photos, forms
Audio
Calls, meetings, voice messages
Multiple input types
Combination of the above
2

What is the task type?

Summarise
Condense long content into shorter form
Classify
Categorise into predefined groups
Generate
Create new content from a brief or data
Extract
Pull specific fields from unstructured text
Search / Q&A
Find answers across documents
Reason
Complex multi-step analysis or decisions
3

What are your output requirements?

Accuracy critical
Errors have significant consequences
Speed critical
Real-time response needed
Cost critical
High volume, budget constrained
Privacy critical
Data must not leave your region
Format controlled
Specific JSON or structured output needed

Output Requirements → Model Direction

If your priority is...And volume is...Use this model tier
Accuracy critical + quality mattersAny→ GPT-4o or Claude Sonnet
High volume + accuracy ≥ 90% verifiedHigh→ GPT-4o-mini or Claude Haiku
Ultra-long context (>100K tokens)Any→ Gemini 1.5 Pro (1M context)
Data must not leave AustraliaAny→ Azure OpenAI (AUS region) or AWS Bedrock (Sydney)
Maximum cost efficiencyHigh→ Gemini 1.5 Flash or Claude Haiku
On-premise / no external APIAny→ Llama 3.1 70B (self-hosted)
European data residencyAny→ Mistral Large (Azure EU)

Model Comparison

All prices per 1 million tokens. Updated Q1 2025.

Filter:
ModelProviderInput/1MOutput/1MSpeedContextPrivacyBest for
GPT-4o
Frontier
OpenAI$3.88$15.50Medium128KAzureComplex reasoning, multimodal, long documents
GPT-4o-mini
Efficient ✓
OpenAI$0.23$0.93Fast128KAzureHigh-volume classification, extraction, triage
Claude 3.5 Sonnet
Frontier
Anthropic$4.65$23.25Medium200KAWS BedrockNuanced reasoning, long documents, report generation
Claude 3 Haiku
Efficient ✓
Anthropic$0.39$1.94Very Fast200KAWS BedrockHigh-volume simple tasks, summarisation at scale
Gemini 1.5 Pro
Frontier
Google$1.94$7.75Medium1MVertex AIUltra-long context, multimodal, document processing
Gemini 1.5 Flash
Efficient ✓
Google$0.12$0.47Very Fast1MVertex AISpeed-optimised, high-volume, cost-sensitive tasks
Llama 3.1 70B
Open Source
Meta (OSS)~$500–5K/mo infraVariable128KFull controlOn-premise, data sovereignty, no external API
Mistral Large
Frontier
Mistral$3.10$9.30Medium128KAzureEuropean data residency requirements
Prices in AUD. AUD conversion at 1.55×. Verify at provider pricing pages before committing.OpenAI ·Anthropic ·Google

Real-World Cost Examples

Actual cost calculations for common business AI use cases.

Customer Support Triage

Classify 800 support emails/day into 12 categories

Input size
300 tokens/email average
Volume
800 calls/day
Cost Comparison
GPT-4o
Daily
$0.60
Monthly
$18
Annual
$219
GPT-4o-mini
Daily
$0.04
Monthly
$1.10
Annual
$13
Claude Haiku
Daily
$0.06
Monthly
$1.80
Annual
$22
Recommendation
GPT-4o-mini or Claude Haiku

Classification is a proven task for smaller models. Run 500 test examples first. If accuracy ≥ 90%, deploy the cheaper model. Save $200+/year with negligible quality loss.

Outcome
94% cost reduction vs GPT-4o

When to Mix Models

The highest-ROI architecture uses different models for different steps.

Use frontier models when

  • Output quality directly affects customers
  • Task involves complex multi-step reasoning
  • Errors have significant cost consequences
  • Volume is low (< 1,000 calls/day)
  • Context window > 16K tokens needed

Use efficient models when

  • Task is classification or extraction
  • Volume is high (> 10,000 calls/day)
  • Accuracy ≥ 90% verified on your task
  • Cost is a material concern at scale
  • Speed matters more than nuance

Mix models when

  • Small model does first-pass triage
  • Low confidence → routes to frontier
  • Frontier model for complex steps only
  • Embedding model for search, LLM for answers
  • Different tasks in one pipeline
The Mixing Pattern — Example
Step 1: Small model classifies (99% of volume, very cheap)
Step 2: Below 70% confidence → route to frontier model (1% of volume)
Result: 99% cost reduction on classification with frontier accuracy on edge cases

Enterprise Model Strategy

❌ Wrong approach
"We will standardise on GPT-4o for all AI use cases."
  • → Over-engineered for simple tasks
  • → Over-budget at scale
  • → Vendor lock-in
  • → Cannot optimise costs later
✓ Right approach
"We standardise on our orchestration layer and abstract model selection."
  • → Switch models without changing code
  • → A/B test models in production
  • → Apply cost governance centrally
  • → Compliance and monitoring in one place
Enterprise Model Governance Checklist
Maintain a Model Registry — approved models by use case type
Set cost thresholds — alert if per-use-case token cost exceeds budget
Quarterly model review — are newer, cheaper models now viable?
Model retirement policy — deprecate when better options available
Data residency map — document where each model processes data
Model abstraction layer — code never calls GPT-4o directly
🎓

Learn to implement this in your organisation

This model selection framework is covered in depth in Module 1 (foundations) and Module 7 (tools and infrastructure) of the Le On AI curriculum.