Model Selection Framework

Choose the Right AI Model

Most organisations use models that are 10× more expensive than necessary. This framework tells you exactly which model to use — and why.

⚡ From Module 1 & Module 7 — Le On AI Curriculum

Right-size first

The best model is the cheapest one that meets your accuracy requirement. Benchmark before committing.

Standardise orchestration

Don't standardise on one model. Standardise on your orchestration layer and swap models freely.

Volume changes everything

At 10 users, model choice costs $20/month. At 1,000 users, it costs $2,000/month. Model the economics first.

The Decision Framework

Three questions that determine your model selection.

What is your input type?

Text only

Emails, documents, tickets, reports

Structured data

Tables, databases, CSV, JSON

Images / visual docs

PDFs with images, photos, forms

Audio

Calls, meetings, voice messages

Multiple input types

Combination of the above

What is the task type?

Summarise

Condense long content into shorter form

Classify

Categorise into predefined groups

Generate

Create new content from a brief or data

Extract

Pull specific fields from unstructured text

Search / Q&A

Find answers across documents

Reason

Complex multi-step analysis or decisions

What are your output requirements?

Accuracy critical

Errors have significant consequences

Speed critical

Real-time response needed

Cost critical

High volume, budget constrained

Privacy critical

Data must not leave your region

Format controlled

Specific JSON or structured output needed

Output Requirements → Model Direction

If your priority is...	And volume is...	Use this model tier
Accuracy critical + quality matters	Any	→ GPT-4o or Claude Sonnet
High volume + accuracy ≥ 90% verified	High	→ GPT-4o-mini or Claude Haiku
Ultra-long context (>100K tokens)	Any	→ Gemini 1.5 Pro (1M context)
Data must not leave Australia	Any	→ Azure OpenAI (AUS region) or AWS Bedrock (Sydney)
Maximum cost efficiency	High	→ Gemini 1.5 Flash or Claude Haiku
On-premise / no external API	Any	→ Llama 3.1 70B (self-hosted)
European data residency	Any	→ Mistral Large (Azure EU)

Model Comparison

All prices per 1 million tokens. Updated Q1 2025.

Filter:

Model	Provider	Input/1M	Output/1M	Speed	Context	Privacy	Best for
GPT-4o Frontier	OpenAI	$3.88	$15.50	Medium	128K	Azure	Complex reasoning, multimodal, long documents
GPT-4o-mini Efficient ✓	OpenAI	$0.23	$0.93	Fast	128K	Azure	High-volume classification, extraction, triage
Claude 3.5 Sonnet Frontier	Anthropic	$4.65	$23.25	Medium	200K	AWS Bedrock	Nuanced reasoning, long documents, report generation
Claude 3 Haiku Efficient ✓	Anthropic	$0.39	$1.94	Very Fast	200K	AWS Bedrock	High-volume simple tasks, summarisation at scale
Gemini 1.5 Pro Frontier	Google	$1.94	$7.75	Medium	1M	Vertex AI	Ultra-long context, multimodal, document processing
Gemini 1.5 Flash Efficient ✓	Google	$0.12	$0.47	Very Fast	1M	Vertex AI	Speed-optimised, high-volume, cost-sensitive tasks
Llama 3.1 70B Open Source	Meta (OSS)	~$500–5K/mo infra		Variable	128K	Full control	On-premise, data sovereignty, no external API
Mistral Large Frontier	Mistral	$3.10	$9.30	Medium	128K	Azure	European data residency requirements

Prices in AUD. AUD conversion at 1.55×. Verify at provider pricing pages before committing.OpenAI ·Anthropic ·Google

Real-World Cost Examples

Actual cost calculations for common business AI use cases.

Customer Support Triage

Classify 800 support emails/day into 12 categories

Input size

300 tokens/email average

Volume

800 calls/day

Cost Comparison

GPT-4o

Daily

$0.60

Monthly

$18

Annual

$219

GPT-4o-mini

Daily

$0.04

Monthly

$1.10

Annual

$13

Claude Haiku

Daily

$0.06

Monthly

$1.80

Annual

$22

Recommendation

GPT-4o-mini or Claude Haiku

Classification is a proven task for smaller models. Run 500 test examples first. If accuracy ≥ 90%, deploy the cheaper model. Save $200+/year with negligible quality loss.

Outcome

94% cost reduction vs GPT-4o

When to Mix Models

The highest-ROI architecture uses different models for different steps.

Use frontier models when

→Output quality directly affects customers
→Task involves complex multi-step reasoning
→Errors have significant cost consequences
→Volume is low (< 1,000 calls/day)
→Context window > 16K tokens needed

Use efficient models when

→Task is classification or extraction
→Volume is high (> 10,000 calls/day)
→Accuracy ≥ 90% verified on your task
→Cost is a material concern at scale
→Speed matters more than nuance

Mix models when

→Small model does first-pass triage
→Low confidence → routes to frontier
→Frontier model for complex steps only
→Embedding model for search, LLM for answers
→Different tasks in one pipeline

The Mixing Pattern — Example

Step 1: Small model classifies (99% of volume, very cheap)
Step 2: Below 70% confidence → route to frontier model (1% of volume)
Result: 99% cost reduction on classification with frontier accuracy on edge cases

Enterprise Model Strategy

❌ Wrong approach

"We will standardise on GPT-4o for all AI use cases."

→ Over-engineered for simple tasks
→ Over-budget at scale
→ Vendor lock-in
→ Cannot optimise costs later

✓ Right approach

"We standardise on our orchestration layer and abstract model selection."

→ Switch models without changing code
→ A/B test models in production
→ Apply cost governance centrally
→ Compliance and monitoring in one place

Enterprise Model Governance Checklist

□Maintain a Model Registry — approved models by use case type

□Set cost thresholds — alert if per-use-case token cost exceeds budget

□Quarterly model review — are newer, cheaper models now viable?

□Model retirement policy — deprecate when better options available

□Data residency map — document where each model processes data

□Model abstraction layer — code never calls GPT-4o directly

🎓

Learn to implement this in your organisation

This model selection framework is covered in depth in Module 1 (foundations) and Module 7 (tools and infrastructure) of the Le On AI curriculum.

Enrol Now →Calculate Your ROI