AI Models. Llama (Meta)
Open source / self-host / multimodal

Llama (Meta)

Llama 4 Scout 10M ctx, Maverick 1M, MoE 17B active

Llama 4 Scout/Maverick (04/05/2025): open-weight MoE family (17B active) on 'Community License' (not OSI): >700M MAU clause, AUP, restrictions on using outputs to improve other LLMs. Scout offers context up to 10M tokens; Maverick — 1M. Natively multimodal (text+image). Meta refused to sign the EU GPAI Code of Practice — relevant for EU compliance.

Verified: 2026-05-22

Purchase decision (when to choose / when to avoid)

Choose if...

  • You want open-weight and control (self-host) + wide tool ecosystem.
  • You have high volume and want to optimize inference cost (quantizations, vLLM, etc.).
  • You're building on-prem / edge solutions and have competence to maintain models.

Avoid if...

  • You don't accept Community License / AUP restrictions — check details before deployment.
  • You want the 'simplest possible' deployment without MLOps — SaaS API will be faster.

Cost in practice (scenarios)

Pilot

Hosted at a provider is fastest; self-host requires preparation.

  • small team
  • quality testing
Scale

Self-host often wins on cost at steady volume, but maintenance is added.

  • GPU, monitoring, guardrails
These are estimates/scenarios (not an invoice). Actual cost depends on context length, number of users, limits and retention policies.

Deployment / data / enterprise

Deployment channels

  • Self-host (vLLM/llama.cpp/Ollama)
  • Hosting providers (e.g. Bedrock / Together / Groq — depending on offering)
  • Integrations in custom applications

Data policy

Training on data
Self-host: on your side.
Retention
Self-host: on your side; hosted: depends on provider.
Data residency
Depends on hosting location.
Key are license terms (Community License) and AUP.

Enterprise readiness

Admin
Self-host: on your side; hosted: depends on provider.
SSO/SCIM
Depends on the platform you deploy on.
Audit
Depends on platform.
DPA
Depends on provider/agreement.
Certifications
Depends on provider/agreement.
Best when you have MLOps and want to control cost at scale.

Best use cases

  • companies wanting to host models on-prem/cloud/edge (privacy, long-term costs)
  • fine-tuning and applications with low latency; extremely long context (Scout 10M)
  • deployments where license compliance and AUP are acceptable ('Built with Llama' attribution required).

Strengths

  • Huge ecosystem (HF, Llama.cpp, vLLM, Ollama, quantizations); Scout 10M — record open context.
  • Llama 4 MoE: Scout (10M ctx, 17B/16E), Maverick (1M ctx, 17B/128E), natively multimodal.
  • Full weight control — audits, modifications; competitive pricing at hosting providers.

Weaknesses / risks

  • Community License is not open-source in the OSI sense; license compliance and AUP are on the deployer.
  • Meta criticized for 'open source' claim; GPAI Code refusal — interpretive risk for EU integrators.

Current models (examples)

  • Llama 4 Scout — MoE 17B active, 10M context, multimodal; Maverick — 1M ctx, 17B/128E.
  • Llama 3.1 (8B/70B/405B); Llama 3 — dense, on-prem/cloud/edge.

Alternatives (if this model doesn't fit)