Trusted AI

Large Language Model as a Service (LLMaaS) - Catalogue of available models

Our Large Language Model as a Service (LLMaaS) offering gives you access to cutting-edge language models, inferred using SecNumCloud-qualified infrastructure, HDS-certified for healthcare data hosting, and therefore sovereign, calculated in France. Benefit from high performance and optimal security for your AI applications. Your data remains strictly confidential, and is neither exploited nor stored after processing.

Simple, transparent pricing

1.8 €

per million input tokens

8 €

per million tokens issued

8 €

per million reasoning tokens

0,01 €

per minute of transcribed audio *

Calculated on an infrastructure based in France, SecNumcloud qualified and HDS certified.

Note on the "Reasoning" price: This price applies specifically to models classified as "reasoners" or "hybrids" (models with the "Reasoning" capability activated) when reasoning is active and only on tokens linked to this activity.

* any minute started is counted

Large models

Our large models offer state-of-the-art performance for the most demanding tasks. They are particularly well-suited to applications requiring a deep understanding of language, complex reasoning or the processing of long documents.

Deployed with a context of 120,000 tokens. Ideal for in-depth analysis of long documents and intelligent assistants.

Parameters :

358 billion

Context Size :

120000

Licence :

Apache 2.0

Energy efficiency :

7.41 kWh/Mtoken

CO₂ equivalent:

170.43 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Reasoning

Background

Multilingual

It supports multimodal inputs (Audio/Video) and offers advanced reasoning capabilities. Note: Audio output via API is not yet enabled.

Parameters :

30 billion

Context Size :

32768

Licence :

Apache 2.0

Energy efficiency :

2.65 kWh/Mtoken

CO₂ equivalent:

60.95 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Omni

Audio

Vision

Agent

Multimodal

BF16

A Mixture-of-Experts (MoE) model with 120 billion parameters and around 5.1 billion active parameters. It offers a configurable reasoning effort and full access to the chain of thought.

Parameters :

120 billion

Context Size :

120000

Licence :

Apache 2.0

Energy efficiency :

2.19 kWh/Mtoken

CO₂ equivalent:

50.37 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

MoE

Agent

Reasoning

Open-Source

Very Large

Combining remarkable efficiency with reduced computational resources, this model offers extensive multilingual capabilities covering 8 major languages (English, French, German, Spanish, Italian, Portuguese, Hindi and Thai). Its contextual window of 132,000 tokens enables in-depth analysis of complex documents and long conversations, while maintaining exceptional overall consistency. Optimised to minimise bias and problematic responses.

Parameters :

70 billion

Context Size :

132000

Licence :

LLAMA 3.3 Community Licence

Energy efficiency :

7.85 kWh/Mtoken

CO₂ equivalent:

180.55 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Dialogue

Multilingual

With unrivalled hardware efficiency, this model incorporates native multimodal capabilities and excels in multilingual performance in over 140 languages. Its impressive contextual window of 120,000 tokens makes it the ideal choice for analysing very large documents, document research and any application requiring understanding of extended contexts. Its optimised architecture allows flexible deployment without compromising the quality of results.

Parameters :

27 billion

Context Size :

120000

Licence :

Google Gemma Terms of Use

Energy efficiency :

6.35 kWh/Mtoken

CO₂ equivalent:

146.05 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Vision

Agent

Broad context

Advanced agentic capabilities for software engineering tasks, native support for a 250K token context, pre-trained on 7.5T tokens with a high code ratio, and optimised by reinforcement learning to improve code execution rates.

Parameters :

30 billion

Context Size :

250000

Licence :

Apache 2.0

Energy efficiency :

1.39 kWh/Mtoken

CO₂ equivalent:

31.97 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Programming

Background

MoE

Significant improvements in following instructions, reasoning, reading comprehension, mathematics, coding and tool use. Native context of 250k tokens.

Parameters :

30 billion

Context Size :

250000

Licence :

Apache 2.0

Energy efficiency :

1.39 kWh/Mtoken

CO₂ equivalent:

31.97 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Background

MoE

Multilingual

A3B-Instruct variant configured with a context of up to 262k tokens, support for function calling, guided decoding (xgrammar) and speculative (qwen3_next_mtp).

Parameters :

80 billion

Context Size :

262144

Licence :

Apache 2.0

Energy efficiency :

1.54 kWh/Mtoken

CO₂ equivalent:

35.42 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Reasoning

Background

MoE

This Vision-Language model incorporates major innovations (DeepStack, MRoPE) for detailed analysis of images and videos. It excels at complex OCR, object detection, graph analysis, and spatio-temporal reasoning. Its architecture enables native understanding of video content and accurate structured extraction (JSON).

Parameters :

30 billion

Context Size :

250000

Licence :

Apache 2.0

Energy efficiency :

3.1 kWh/Mtoken

CO₂ equivalent:

71.3 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Vision

Agent

Background

Multimodal

Video

OCR

Offers the same advanced capabilities as the 30B (DeepStack, MRoPE) with increased modelling capacity. Particularly effective for tasks requiring high visual analysis accuracy and deep contextual understanding. Supports text-timestamp alignment for video.

Parameters :

32 billion

Context Size :

250000

Licence :

Apache 2.0

Energy efficiency :

7.84 kWh/Mtoken

CO₂ equivalent:

180.32 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Vision

Agent

Background

Multimodal

Video

OCR

OLMo 3-7B is a dense model optimised for efficiency (requiring 2.5 times fewer resources than Llama 3.1 8B for comparable performance). It excels particularly in mathematics and programming. With its 65k token window, it is ideal for tasks requiring full auditability.

Parameters :

7 billion

Context Size :

65536

Licence :

Apache 2.0

Energy efficiency :

1.65 kWh/Mtoken

CO₂ equivalent:

37.95 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Open-Source

Background

Transparent

Efficient

Mathematics

Code

OLMo 3-32B uses advanced architecture (GQA) to offer exceptional reasoning capabilities. It excels on complex benchmarks (MATH, HumanEvalPlus) and is capable of exposing its thought process (Think variant). It is the preferred choice for critical tasks requiring high performance and total transparency.

Parameters :

32 billion

Context Size :

65536

Licence :

Apache 2.0

Energy efficiency :

7.02 kWh/Mtoken

CO₂ equivalent:

161.46 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Open-Source

Background

Reasoning

Transparent

Code

High Performance

Ultra-sparse Mixture-of-Experts architecture with 512 experts. Combines the power of a very large model with the efficiency of a smaller model. Excels at mathematics, coding, and logical reasoning.

Parameters :

235 billion

Context Size :

130000

Licence :

Apache 2.0

Energy efficiency :

3.93 kWh/Mtoken

CO₂ equivalent:

90.39 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

MoE

Agent

Reasoning

Very Large

This Vision-Language model excels at in-depth analysis of complex documents, multilingual OCR and reasoning about dense visual and textual content. It is designed for critical tasks requiring maximum accuracy and extensive contextual understanding.

Parameters :

235 billion

Context Size :

200000

Licence :

Apache 2.0

Energy efficiency :

7.35 kWh/Mtoken

CO₂ equivalent:

169.05 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Reasoning

Background

NVFP4

Blackwell

Vision

Deployed with an extended context of 250k tokens. Excels at complex reasoning and coding while remaining efficient.

Parameters :

14 billion

Context Size :

250000

Licence :

Apache 2.0

Energy efficiency :

4.3 kWh/Mtoken

CO₂ equivalent:

98.9 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

High Performance

Edge

Reasoning

Code

Excellent size/performance ratio. Capable of good-level reasoning and coding.

Parameters :

14 billion

Context Size :

131072

Licence :

Apache 2.0

Energy efficiency :

0.9 kWh/Mtoken

CO₂ equivalent:

20.7 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Versatile

Multilingual

Designed to excel at complex tasks requiring superior depth of analysis, this model stands out for its ability to break down multidimensional problems and provide structured, well-argued answers. It incorporates advanced logic checking mechanisms to minimise hallucinations.

Parameters :

32 billion

Context Size :

32000

Licence :

LLAMA 3.2 Community Licence

Energy efficiency :

6.67 kWh/Mtoken

CO₂ equivalent:

153.41 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Reasoning

Understanding

Analysis

Uses Nano V3 architecture. Excels at function calling, structured reasoning and analysis of long contexts.

Parameters :

30 billion

Context Size :

250000

Licence :

NVIDIA Community License

Energy efficiency :

1.62 kWh/Mtoken

CO₂ equivalent:

37.26 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Reasoning

Background

Specialised models

Our specialised models are optimised for specific tasks such as code generation, image analysis or structured data processing. They offer an excellent performance/cost ratio for targeted use cases.

Despite its small size, this model offers surprising performance for conversational tasks and simple reasoning. Ideal for mobile devices.

Parameters :

3 billion

Context Size :

250000

Licence :

Apache 2.0

Energy efficiency :

1.22 kWh/Mtoken

CO₂ equivalent:

28.06 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Compact

Efficient

Edge

Version 8B is more robust, capable of handling longer contexts and more complex reasoning, while remaining very fast.

Parameters :

8 billion

Context Size :

250000

Licence :

Apache 2.0

Energy efficiency :

2.42 kWh/Mtoken

CO₂ equivalent:

55.66 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Efficient

Edge

Reasoning

Perfect for simple tasks, rapid classification or execution on highly constrained devices.

Parameters :

1 billion

Context Size :

120000

Licence :

Google Gemma Terms of Use

Energy efficiency :

1.15 kWh/Mtoken

CO₂ equivalent:

26.45 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Compact

Efficient

Edge

Capable of decent reasoning and good language comprehension. A good candidate for more advanced local assistants.

Parameters :

4 billion

Context Size :

120000

Licence :

Google Gemma Terms of Use

Energy efficiency :

1.27 kWh/Mtoken

CO₂ equivalent:

29.21 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Compact

Efficient

Edge

Offers an excellent compromise between semantic performance and speed of execution.

Parameters :

0.6 billion

Context Size :

32768

Licence :

Apache 2.0

Energy efficiency :

0.57 kWh/Mtoken

CO₂ equivalent:

13.11 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Embedding

Compact

Efficient

Ideal for semantic search tasks requiring minimal latency.

Parameters :

0.278 billion

Context Size :

8192

Licence :

Apache 2.0

Energy efficiency :

0.31 kWh/Mtoken

CO₂ equivalent:

7.13 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Embedding

Compact

Efficient

Deployed with a context of 40,000 tokens for processing large documents.

Parameters :

4 billion

Context Size :

40000

Licence :

Apache 2.0

Energy efficiency :

0.57 kWh/Mtoken

CO₂ equivalent:

13.11 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Embedding

Background

Efficient

Deployed with a context of 8192 tokens. Supports dense, sparse and multi-vector search methods.

Parameters :

0.567 billion

Context Size :

8192

Licence :

MIT

Energy efficiency :

0.36 kWh/Mtoken

CO₂ equivalent:

8.28 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Embedding

Multilingual

Efficient

Built on Gemma 3, this model produces vector representations of text for classification, clustering and similarity search. Trained on over 100 languages, its small size makes it perfect for resource-constrained environments.

Parameters :

0.3 billion

Context Size :

2048

Licence :

Google Gemma Terms of Use

Energy efficiency :

0.35 kWh/Mtoken

CO₂ equivalent:

8.05 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Embedding

Compact

Semantics

Efficient

Multilingual

A Mixture-of-Experts (MoE) model with 21 billion parameters and 3.6 billion active parameters. It offers configurable reasoning effort and agent capabilities.

Parameters :

20 billion

Context Size :

120000

Licence :

Apache 2.0

Energy efficiency :

14.81 kWh/Mtoken

CO₂ equivalent:

340.63 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

MoE

Agent

Reasoning

Open-Source

Compact

Fast

This 'Thinking' version has an increased thought length, making it ideal for highly complex reasoning tasks. It also offers general improvements in following instructions, using tools and generating text.

Parameters :

4 billion

Context Size :

250000

Licence :

Apache 2.0

Energy efficiency :

2.56 kWh/Mtoken

CO₂ equivalent:

58.88 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Reasoning

Background

Compact

Fast

Significant improvements in following instructions, logical reasoning, reading comprehension, mathematics, coding and tool use. Native context of 250k tokens.

Parameters :

4 billion

Context Size :

250000

Licence :

Apache 2.0

Energy efficiency :

4.44 kWh/Mtoken

CO₂ equivalent:

102.12 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Background

Compact

Fast

Multilingual

RNJ-1 is a dense model with 8.3B parameters trained on 8.4T tokens. It uses global attention and YaRN to provide a context of 32k tokens. It excels at code generation (83.5% HumanEval+) and mathematical reasoning, often outperforming much larger models.

Parameters :

8.3 billion

Context Size :

32000

Licence :

Open Weights

Energy efficiency :

1.97 kWh/Mtoken

CO₂ equivalent:

45.31 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Code

Mathematics

STEM

Reasoning

Efficient

Despite its small size, this model incorporates Qwen3-VL technologies (MRoPE, DeepStack) to deliver impressive image and video analysis. Ideal for mobile or embedded applications requiring OCR, object detection or rapid visual understanding.

Parameters :

2 billion

Context Size :

250000

Licence :

Apache 2.0

Energy efficiency :

0.95 kWh/Mtoken

CO₂ equivalent:

21.85 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Vision

Compact

Efficient

Multimodal

Edge

OCR

Excellent compromise between performance and resources. Capable of analysing complex documents, graphics and videos with high accuracy. Supports structured extraction and visual reasoning.

Parameters :

4 billion

Context Size :

250000

Licence :

Apache 2.0

Energy efficiency :

2.34 kWh/Mtoken

CO₂ equivalent:

53.82 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Vision

Compact

Multimodal

Efficient

Video

OCR

Ideal for deployment on lightweight servers or as the first level of processing for complex workflows. Configured with a context of 40,000 tokens.

Parameters :

0.6 billion

Context Size :

40000

Licence :

Apache 2.0

Energy efficiency :

1.33 kWh/Mtoken

CO₂ equivalent:

30.59 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Compact

Fast

Efficient

Multilingual

Version 8B of the Qwen3-VL model. Excellent compromise between performance and resources. Capable of analysing complex documents, graphics and video with high accuracy.

Parameters :

8 billion

Context Size :

250000

Licence :

Apache 2.0

Energy efficiency :

3.03 kWh/Mtoken

CO₂ equivalent:

69.69 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Vision

Compact

Multimodal

Efficient

Video

OCR

Devstral excels at using tools to explore code bases, modify multiple files and drive engineering agents. Based on Mistral Small 3, it offers advanced reasoning and coding capabilities. Configured with Mistral-specific optimisers (tokenizer, parser).

Parameters :

24 billion

Context Size :

120000

Licence :

Apache 2.0

Energy efficiency :

3.28 kWh/Mtoken

CO₂ equivalent:

75.44 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Programming

Open-Source

Background

FP8

Optimised for exploring codebases, multi-file editing, and tool usage. Offers performance close to >100B models for code (SWE-bench Verified 68%). Natively supports vision. Deployed with an extended context of 380k tokens to handle entire projects.

Parameters :

24 billion

Context Size :

380000

Licence :

Apache 2.0

Energy efficiency :

5.8 kWh/Mtoken

CO₂ equivalent:

133.4 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Programming

Vision

Open-Source

Very Broad Context

This hybrid model (Transformer + Mamba-2) with 32 billion parameters (9B active) is optimised for enterprise workflows such as multi-tool agents and customer support automation. Its innovative architecture reduces RAM usage by more than 70% for long contexts and multiple batches.

Parameters :

32 billion

Context Size :

128000

Licence :

Apache 2.0

Energy efficiency :

4.04 kWh/Mtoken

CO₂ equivalent:

92.92 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Reasoning

Security

MoE

Background

Efficient

This 7 billion parameter (1B active) model combines Transformer and Mamba-2 layers for maximum efficiency. It reduces RAM usage by over 70% for long contexts, making it ideal for resource-constrained devices and fast tasks such as function calling.

Parameters :

7 billion

Context Size :

128000

Licence :

Apache 2.0

Energy efficiency :

1.05 kWh/Mtoken

CO₂ equivalent:

24.15 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Reasoning

Security

MoE

Background

Efficient

Fast

Compact

Two-stage OCR system (visual encoder + MoE 3B decoder) optimised for converting documents into structured Markdown (tables, formulas). Requires specific pre-processing (Logits Processor) for optimum performance.

Parameters :

3 billion

Context Size :

8192

Licence :

MIT licence

Energy efficiency :

1.01 kWh/Mtoken

CO₂ equivalent:

23.23 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Vision

OCR

Efficient

MedGemma is suitable for tasks such as generating medical imaging reports or answering natural language questions about medical images. MedGemma can be adapted for use cases requiring medical knowledge, such as patient interviewing, triage, clinical decision support and summarisation. Although its basic performance is solid, MedGemma is not yet clinical-grade and will probably require further refinement. Based on the Gemma 3 architecture (native multimodal), this 27B model incorporates a SigLIP image encoder pre-trained on medical data. It supports a context of 128k tokens and is in FP16 for maximum precision.

Parameters :

27 billion

Context Size :

128000

Licence :

Google Gemma Terms of Use

Energy efficiency :

6.56 kWh/Mtoken

CO₂ equivalent:

150.88 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Medical

Vision

Specialised

Background

This version 3.2 retains the strengths of its predecessor while making targeted improvements. It is better able to follow precise instructions, produces fewer infinite generations or repetitive responses, and its function calling template is more robust. In other respects, its performance is equivalent to or slightly better than version 3.1.

Parameters :

24 billion

Context Size :

128000

Licence :

Apache 2.0

Energy efficiency :

5.35 kWh/Mtoken

CO₂ equivalent:

123.05 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Vision

Agent

Security

Instruction Following

Model comparison

This comparison table will help you choose the model best suited to your needs, based on various criteria such as context size, performance and specific use cases.

Comparative table of the characteristics and performance of the various AI models available, grouped by category (large-scale models and specialist models).
Model	Publisher	Parameters	Context (k tokens)
Large models
glm-4.7:358b	Zhipu AI	358B	120000
qwen3-omni:30b	Qwen Team	30B	32768
gpt-oss:120b	OpenAI	120B	120000
llama3.3:70b	Meta	70B	132000
gemma3:27b	Google	27B	120000
qwen3-coder:30b	Qwen Team	30B	250000
qwen3-2507:30b-a3b	Qwen Team	30B	250000
qwen3-next:80b	Qwen Team	80B	262144
qwen3-vl:30b	Qwen Team	30B	250000
qwen3-vl:32b	Qwen Team	32B	250000
Elm 3:7b	AllenAI	7B	65536
elm tree 3:32b	AllenAI	32B	65536
qwen3-2507:235b	Qwen Team	235B	130000
qwen3-vl:235b	Qwen Team	235B	200000
ministral-3:14b	Mistral AI	14B	250000
qwen3:14b	Qwen Team	14B	131072
cogito:32b	Deep Cogito	32B	32000
nemotron-3-nano:30b	NVIDIA	30B	250000
Specialised models
ministral-3:3b	Mistral AI	3B	250000
ministral-3:8b	Mistral AI	8B	250000
gemma3:1b	Google	1B	120000
gemma3:4b	Google	4B	120000
qwen3-embedding:0.6b	Qwen Team	0.6B	32768
granite-embedding:278m	IBM	278M	8192
qwen3-embedding:4b	Qwen Team	4B	40000
bge-m3:567m	BAAI	567M	8192
embeddinggemma:300m	Google	300M	2048
gpt-oss:20b	OpenAI	20B	120000
qwen3-2507-think:4b	Qwen Team	4B	250000
qwen3-2507:4b	Qwen Team	4B	250000
rnj-1:8b	Essential AI	8B	32000
qwen3-vl:2b	Qwen Team	2B	250000
qwen3-vl:4b	Qwen Team	4B	250000
qwen3:0.6b	Qwen Team	0.6B	40000
qwen3-vl:8b	Qwen Team	8B	250000
devstral:24b	Mistral AI & All Hands AI	24B	120000
devstral-small-2:24b	Mistral AI & All Hands AI	24B	380000
granite4-small-h:32b	IBM	32B (9B active)	128000
granite4-tiny-h:7b	IBM	7B (1B active)	128000
deepseek-ocr	DeepSeek AI	3B	8192
medgemma:27b	Google	27B	128000
mistral-small3.2:24b	Mistral AI	24B	128000

Legend and explanation

Functionality or capacity supported by the model

Functionality or capability not supported by the model

* Energy efficiency Indicates particularly low energy consumption (< 2.0 kWh/Mtoken)

* Quick Model capable of generating more than 50 tokens per second

Note on performance measures

The speed values (tokens/s) represent performance targets in real-life conditions. Energy consumption (kWh/Mtoken) is calculated by dividing the estimated power of the inference server (in Watts) by the measured speed of the model (in tokens/second), then converted into kilowatt-hours per million tokens (division by 3.6). This method offers a practical comparison of the energy efficiency of different models, to be used as a relative indicator rather than an absolute measure of power consumption.

Recommended use cases

Here are some common use cases and the most suitable models for each. These recommendations are based on the specific performance and capabilities of each model.

Multilingual dialogue

Chatbots and assistants capable of communicating in several languages, with automatic detection, context maintenance throughout the conversation and understanding of linguistic specificities.

Recommended models

Llama 3.3
Mistral Small 3.2
Qwen 3
Openai OSS
Granite 4

Analysis of long documents

Processing of large documents (>100 pages), maintaining context throughout the text, extracting key information, generating relevant summaries and answering specific content questions

Recommended models

Gemma 3
Qwen next
Qwen 3
Granite 4

Programming and development

Generating and optimising code in multiple languages, debugging, refactoring, developing complete functionalities, understanding complex algorithmic implementations and creating unit tests

Recommended models

DeepCoder
Qwen3 coding
Granite 4
Devstral

Visual analysis

Direct processing of images and visual documents without OCR pre-processing, interpretation of technical diagrams, graphs, tables, drawings and photos with generation of detailed textual explanations of the visual content

Recommended models

deepseek-OCR
Mistral Small 3.2
Gemma 3
Qwen 3 VL

Safety and compliance

Applications requiring specific security capabilities; filtering of sensitive content, traceability of reasoning, RGPD/HDS verification, risk minimisation, vulnerability analysis and compliance with sectoral regulations

Recommended models

Granite Guardian
Granite 4
Devstral
Mistral Small 3.2
Magistral small

Light and on-board deployments

Applications requiring a minimal resource footprint, deployment on capacity-constrained devices, real-time inference on standard CPUs and integration into embedded or IoT systems

Recommended models

Gemma 3n
Granite 4 tiny
Qwen 3 VL (2B)

Your request is about:

Trusted AI

Large models

glm-4.7:358b

qwen3-omni:30b

gpt-oss:120b

llama3.3:70b

gemma3:27b

qwen3-coder:30b

qwen3-2507:30b-a3b

qwen3-next:80b

qwen3-vl:30b

qwen3-vl:32b

Elm 3:7b

elm tree 3:32b

qwen3-2507:235b

qwen3-vl:235b

ministral-3:14b

qwen3:14b

cogito:32b

nemotron-3-nano:30b

Specialised models

ministral-3:3b

ministral-3:8b

gemma3:1b

gemma3:4b

qwen3-embedding:0.6b

granite-embedding:278m

qwen3-embedding:4b

bge-m3:567m

embeddinggemma:300m

gpt-oss:20b

qwen3-2507-think:4b

qwen3-2507:4b

rnj-1:8b

qwen3-vl:2b

qwen3-vl:4b

qwen3:0.6b

qwen3-vl:8b

devstral:24b

devstral-small-2:24b

granite4-small-h:32b

granite4-tiny-h:7b

deepseek-ocr

medgemma:27b

mistral-small3.2:24b

Model comparison

Recommended use cases

Multilingual dialogue

Analysis of long documents

Programming and development

Visual analysis

Safety and compliance

Light and on-board deployments