Skip to content
Home/Reference Info/LLM Model Reference

LLM Model Reference

Review parameter size, context window, recommended GPU, and deployment characteristics.

LLM Model Reference

Filter by series, family, variant, and parameter scale for fast model comparison before deployment.

VRAM is an engineering estimate. A ⚠ marker means the listed estimate is below the weight-only theoretical floor and should be rechecked.

Model series
Family
Variant
FamilyVariantVRAM estimate (GB)Recommended GPU
MiMo-V2.5-ProXiaomiMiMo
Tool Calling
2026-04-27xiaomiMiMo-V2.5-ProCore / Mainline1020B / A42B1024K
INT4857.6
INT81434.4
FP162716.3
20x NVIDIA H200 141GBMinimum: 7x NVIDIA H200 141GB
DeepSeek-V4-Prodeepseek-ai
Tool Calling
2026-04-22deepseekDeepSeek-V4-ProCore / Mainline1600B / A49B1024K
INT41256.6
INT82160
FP164167.5
30x NVIDIA H200 141GBMinimum: 9x NVIDIA H200 141GB
Tool Calling
2026-04-22deepseekDeepSeek-V4-FlashCore / Mainline284B / A13B1024K
INT4200.4
INT8361.1
FP16718.3
6x NVIDIA H200 141GBMinimum: 2x NVIDIA H200 141GB
Multimodal
2026-04-21qwenQwen3.6-27BCore / Mainline27B256K
INT424
INT838
FP1672
A100 80GB / H100 80GBMinimum: L40S 48GB / A100 80GB
Multimodal
2026-04-15qwenQwen3.6-35B-A3BCore / Mainline35B MoE / 3B active256K
INT418
INT832
FP1678
A100 80GB / H100 80GBMinimum: L40S 48GB / A100 80GB
MiniMax-M2.7MiniMaxAI
Tool CallingJSON
2026-04-09minimaxMiniMax-M2.7Core / Mainline229B MoE / 10B active200K
INT4311.2
INT8440.4
FP16728.3
6x NVIDIA H200 141GBMinimum: 3x NVIDIA H200 141GB
GLM-5.1zai-org
Tool Calling
2026-04-03zaiGLM-5.1Core / Mainline754B MoE / 40B active198K
INT4563.9
INT8990.8
FP161939.6
25x H100 80GBMinimum: 8x H100 80GB
Multimodal
2026-02-28qwenQwen3.5-2BCore / Mainline2B256K
INT41.6
INT83
FP166
RTX 5090 32GB / L20 48GBMinimum: RTX 4090 24GB
Multimodal
2026-02-28qwenQwen3.5-0.8BCore / Mainline0.8B256K
INT40.6
INT81.2
FP162.4
RTX 5090 32GB / L20 48GBMinimum: RTX 4090 24GB
Multimodal
2026-02-27qwenQwen3.5-9BCore / Mainline9B128K
INT46.5
INT812.4
FP1624.8
RTX 5090 32GB / L20 48GBMinimum: RTX 4090 24GB
Multimodal
2026-02-27qwenQwen3.5-4BCore / Mainline4B128K
INT42.9
INT85.5
FP1611
RTX 5090 32GB / L20 48GBMinimum: RTX 4090 24GB
Multimodal
2026-02-24qwenQwen3.5-122B-A10BCore / Mainline122B-A10B256K
INT495.3
INT8183
FP16366
3x NVIDIA H200 141GBMinimum: NVIDIA H200 141GB
Multimodal
2026-02-24qwenQwen3.5-35B-A3BCore / Mainline35B-A3B256K
INT427.3
INT852.5
FP16105
H100 80GB / H200 141GBMinimum: L40S 48GB / A100 80GB
Multimodal
2026-02-24qwenQwen3.5-27BCore / Mainline27B256K
INT421.1
INT840.5
FP1681
A100 80GB / H100 80GBMinimum: L40S 48GB
Multimodal
2026-02-16qwenQwen3.5-397B-A17BCore / Mainline397B-A17B256K
INT4310.2
INT8595.5
FP161191
9x NVIDIA H200 141GBMinimum: 3x NVIDIA H200 141GB
2026-02-13qwenWebWorld-32BCore / Mainline32B40K
INT433.2
INT852.6
FP1695.8
NVIDIA H200 141GBMinimum: NVIDIA RTX A6000 48GB / NVIDIA A40 48GB
2026-02-13qwenWebWorld-14BCore / Mainline14B40K
INT416.9
INT825.2
FP1644.1
NVIDIA RTX A6000 48GB / NVIDIA A40 48GBMinimum: NVIDIA RTX A6000 48GB / NVIDIA TITAN RTX 24GB
2026-02-13qwenWebWorld-8BCore / Mainline8B40K
INT412.4
INT816.7
FP1627.4
NVIDIA RTX A6000 48GB / NVIDIA A40 48GBMinimum: NVIDIA RTX A6000 48GB / GeForce RTX 4060 Ti 16GB
MiniMax-M2.5MiniMaxAI
Tool CallingJSON
2026-02-12minimaxMiniMax-M2.5Core / Mainline229B MoE / 10B active192K
INT4311.2
INT8440.4
FP16728.3
6x NVIDIA H200 141GBMinimum: 3x NVIDIA H200 141GB
Tool CallingJSON
2026-02-11openbmbMiniCPM-SALACore / Mainline9B512K
INT412
INT818
FP1630
L40S 48GB / A100 80GBMinimum: RTX 4090 24GB / L40S 48GB
GLM-5zai-org
Tool Calling
2026-02-11zaiGLM-5Core / Mainline744B / A40B198K
INT4557.4
INT8979
FP161915.8
14x NVIDIA H200 141GBMinimum: 4x NVIDIA H200 141GB
Step-3.5-Flashstepfun-ai
Tool CallingJSON
2026-02-01stepfunStep-3.5-FlashCore / Mainline196B MoE / 11B active256K
INT4288.6
INT8399.4
FP16646.3
5x NVIDIA H200 141GBMinimum: 3x NVIDIA H200 141GB
Tool CallingJSON
2026-01-30qwenQwen3-Coder-NextCore / Mainline80B-A3B256K
INT462.5
INT8120
FP16240
-Minimum: -
Tool CallingJSON
2026-01-19zaiGLM-4.7-FlashCore / Mainline30B-A3B198K
INT424.3
INT840.4
FP1676.1
L40S 48GB / H100 80GBMinimum: RTX 5090 32GB / L40S 48GB
GLM-4.7zai-org
Tool CallingJSON
2025-12-22zaiGLM-4.7Core / Mainline355B MoE / 32B active198K
INT4404.8
INT8611.1
FP161069.8
8x NVIDIA H200 141GBMinimum: 3x NVIDIA H200 141GB
MiniMax-M2.1MiniMaxAI
Tool Calling
2025-12-20minimaxMiniMax-M2.1Core / Mainline229B MoE192K
INT4171.9
INT8305.1
FP16600.9
5x NVIDIA H200 141GBMinimum: 2x NVIDIA H200 141GB
MiMo-V2-FlashXiaomiMiMo
Tool CallingJSON
2025-12-16xiaomiMiMo-V2-FlashCore / Mainline309B / A15B256K
INT4366.3
INT8541.3
FP16930.1
7x NVIDIA H200 141GBMinimum: 3x NVIDIA H200 141GB
DeepSeek-V3.2deepseek-ai
Tool CallingJSON
2025-12-01deepseekDeepSeek-V3.2Core / Mainline671B MoE / 37B active160K
INT4617.3
INT8997.6
FP161842.8
14x NVIDIA H200 141GBMinimum: 5x NVIDIA H200 141GB
Tool CallingJSON
2025-11-28deepseekDeepSeek-V3.2-SpecialeCore / Mainline671B MoE / 37B active160K
INT4617.3
INT8997.6
FP161842.8
14x NVIDIA H200 141GBMinimum: 5x NVIDIA H200 141GB
Tool CallingJSON
2025-11-04moonshotKimi-K2Thinking1100B MoE / 32B active256K
INT4911.4
INT81532.3
FP162912.2
21x NVIDIA H200 141GBMinimum: 7x NVIDIA H200 141GB
Tool CallingJSON
2025-10-30moonshotKimi-Linear-48B-A3BInstruct48B MoE / 3B active1024K
INT4119.6
INT8146.6
FP16206.5
2x NVIDIA H200 141GBMinimum: NVIDIA H200 141GB
MiniMax-M2MiniMaxAI
Tool CallingJSON
2025-10-22minimaxMiniMax-M2Core / Mainline229B MoE / 10B active192K
INT4311.2
INT8440.4
FP16728.3
6x NVIDIA H200 141GBMinimum: 3x NVIDIA H200 141GB
GLM-4.6zai-org
Tool CallingJSON
2025-09-29zaiGLM-4.6Core / Mainline355B MoE / 32B active198K
INT4258.9
INT8459.9
FP16906.7
12x H100 80GBMinimum: 4x H100 80GB
Tool CallingJSON
2025-09-29deepseekDeepSeek-V3.2-ExpCore / Mainline671B / A37B160K
INT4522
INT8902.3
FP161747.5
13x NVIDIA H200 141GBMinimum: 4x NVIDIA H200 141GB
2025-09-09qwenQwen3-Next-80B-A3BThinking80B-A3B256K
INT462.5
INT8120
FP16240
-Minimum: -
Tool CallingJSON
2025-09-09qwenQwen3-Next-80B-A3BInstruct80B-A3B256K
INT462.5
INT8120
FP16240
-Minimum: -
Tool CallingJSON
2025-09-03moonshotKimi-K2Instruct1000B MoE / 32B active256K
INT4842.7
INT81407.5
FP162662.4
19x NVIDIA H200 141GBMinimum: 6x NVIDIA H200 141GB
DeepSeek-V3.1deepseek-ai
Tool CallingJSON
2025-08-21deepseekDeepSeek-V3.1Core / Mainline671B MoE / 37B active125K
INT4617.3
INT8997.6
FP161842.8
14x NVIDIA H200 141GBMinimum: 5x NVIDIA H200 141GB
Tool CallingJSON
2025-08-05qwenQwen3-4BThinking4B256K
INT43.1
INT86
FP1612
-Minimum: -
Tool CallingJSON
2025-08-05qwenQwen3-4BInstruct4B256K
INT43.1
INT86
FP1612
-Minimum: -
Tool CallingJSON
2025-07-31qwenQwen3-Coder-30B-A3BInstruct30B-A3B256K
INT423.4
INT845
FP1690
-Minimum: -
Tool CallingJSON
2025-07-29qwenQwen3-30B-A3BThinking30B-A3B256K
INT423.4
INT845
FP1690
-Minimum: -
Tool CallingJSON
2025-07-28qwenQwen3-30B-A3BInstruct30B-A3B256K
INT423.4
INT845
FP1690
-Minimum: -
Tool CallingJSON
2025-07-25qwenQwen3-235B-A22BThinking235B-A22B256K
INT4183.6
INT8352.5
FP16705
-Minimum: -
Tool CallingJSON
2025-07-22qwenQwen3-Coder-480B-A35BInstruct480B-A35B256K
INT4375
INT8720
FP161440
-Minimum: -
Tool CallingJSON
2025-07-21qwenQwen3-235B-A22BInstruct235B-A22B256K
INT4183.6
INT8352.5
FP16705
-Minimum: -
Tool Calling
2025-07-20zaiGLM-4.5-AirCore / Mainline106B MoE / 12B active128K
INT484.4
INT8146.5
FP16285.7
4x H100 80GBMinimum: 2x H100 80GB
GLM-4.5zai-org
Tool Calling
2025-07-20zaiGLM-4.5Core / Mainline355B MoE / 32B active128K
INT4258.9
INT8459.9
FP16906.7
12x H100 80GBMinimum: 4x H100 80GB
Tool CallingJSON
2025-07-11moonshotKimi-K2Instruct1000B / A32B128K
INT4747.4
INT81312.2
FP162567.2
19x NVIDIA H200 141GBMinimum: 6x NVIDIA H200 141GB
Tool CallingJSON
2025-07-01minimaxMiniMax-M1-80k-hfCore / Mainline456B MoE / 45.9B active1,000,000
INT4470.6
INT8730.9
FP161309.4
10x NVIDIA H200 141GBMinimum: 4x NVIDIA H200 141GB
Tool CallingJSON
2025-06-03minimaxMiniMax-Text-01-hfCore / Mainline456B MoE / 45.9B active1,000,000
INT4470.6
INT8730.9
FP161309.4
10x NVIDIA H200 141GBMinimum: 4x NVIDIA H200 141GB
MiMo-7B-RL-0530XiaomiMiMo
Tool CallingJSON
2025-05-30xiaomiMiMo-7BRL8B64K
INT46
INT810
FP1616
RTX 4090 24GB / L40S 48GBMinimum: RTX 4060 Ti 16GB / RTX 4090 24GB
Tool Calling
2025-05-29deepseekDeepSeek-R1-0528-Qwen3-8BCore / Mainline8B128K
INT45
INT89.6
FP1619.2
-Minimum: -
DeepSeek-R1-0528deepseek-ai
Tool Calling
2025-05-28deepseekDeepSeek-R1Core / Mainline685B / A37B160K
INT4531.6
INT8919.8
FP161782.4
13x NVIDIA H200 141GBMinimum: 4x NVIDIA H200 141GB
Tool CallingJSON
2025-04-27qwenQwen3-235B-A22BCore / Mainline235B-A22B40K
INT4146.9
INT8282
FP16564
-Minimum: -
Tool CallingJSON
2025-04-27qwenQwen3-32BCore / Mainline32B40K
INT420
INT838.4
FP1676.8
-Minimum: -
Tool CallingJSON
2025-04-27qwenQwen3-30B-A3BCore / Mainline30B-A3B40K
INT418.8
INT836
FP1672
-Minimum: -
Tool CallingJSON
2025-04-27qwenQwen3-14BCore / Mainline14B40K
INT48.8
INT816.8
FP1633.6
L40S 48GB / A100 80GBMinimum: RTX 4090 24GB
Tool CallingJSON
2025-04-27qwenQwen3-8BCore / Mainline8B40K
INT45
INT89.6
FP1619.2
-Minimum: -
Tool CallingJSON
2025-04-27qwenQwen3-4BCore / Mainline4B40K
INT42.5
INT84.8
FP169.6
-Minimum: -
Tool CallingJSON
2025-04-27qwenQwen3-1.7BCore / Mainline1.7B40K
INT41.1
INT82
FP164.1
-Minimum: -
Tool CallingJSON
2025-04-27qwenQwen3-0.6BCore / Mainline0.6B40K
INT40.4
INT80.7
FP161.4
-Minimum: -
Tool CallingJSON
2025-04-13zaiGLM-Z1-Rumination-32BCore / Mainline32B128K
INT432.8
INT852.2
FP1695.4
2x H100 80GBMinimum: L40S 48GB / A6000 48GB
Tool CallingJSON
2025-04-08zaiGLM-Z1-32BCore / Mainline32B32K
INT426.8
INT846.2
FP1689.3
2x H100 80GBMinimum: L40S 48GB / A6000 48GB
Tool CallingJSON
2025-04-08zaiGLM-Z1-9BCore / Mainline9B32K
INT49.7
INT814.5
FP1626.6
L40S 48GB / A6000 48GBMinimum: RTX 3060 12GB / RTX 4090 24GB
Tool CallingJSON
2025-04-07zaiGLM-4-32BCore / Mainline32B32K
INT424
INT839
FP1676
2x L40S 48GB / A100 80GBMinimum: RTX 4090 24GB / L40S 48GB
Tool CallingJSON
2025-04-07zaiGLM-4-9BCore / Mainline9B32K
INT49
INT812
FP1622
RTX 4090 24GB / L40S 48GBMinimum: RTX 4090 24GB / L4 24GB
DeepSeek-V3-0324deepseek-ai
Tool CallingJSON
2025-03-24deepseekDeepSeek-V3Core / Mainline684.53B160K
INT4798.5
INT81405
FP162752.9
35x H100 80GBMinimum: 10x H100 80GB
Tool Calling
2025-02-22moonshotMoonlight-16B-A3BInstruct16B8K
INT414.6
INT823.8
FP1644.7
L40S 48GB / A6000 48GBMinimum: RTX 4090 24GB / L4 24GB
Tool Calling
2025-02-22moonshotMoonlight-16B-A3BCore / Mainline16B8K
INT414.6
INT823.8
FP1644.7
L40S 48GB / A6000 48GBMinimum: RTX 4090 24GB / L4 24GB
DeepSeek-R1deepseek-ai
2025-01-20deepseekDeepSeek-R1Core / Mainline685B / A37B160K
INT4531.6
INT8919.8
FP161782.4
13x NVIDIA H200 141GBMinimum: 4x NVIDIA H200 141GB
Tool CallingJSON
2025-01-20deepseekDeepSeek-R1-Distill-Qwen-32BCore / Mainline32B128K
INT430
INT844
FP1682
A100 80GB / H100 80GBMinimum: L40S 48GB / A100 80GB
Tool CallingJSON
2025-01-20deepseekDeepSeek-R1-Distill-Qwen-14BCore / Mainline14B128K
INT414
INT824
FP1638
L40S 48GB / A100 80GBMinimum: RTX 4090 24GB / L40S 48GB
Tool CallingJSON
2025-01-20deepseekDeepSeek-R1-Distill-Llama-70BCore / Mainline70B128K
INT452
INT892
FP16170
2x A100 80GB / 2x H100 80GBMinimum: A100 80GB / H100 80GB
Tool CallingJSON
2025-01-12minimaxMiniMax-Text-01Core / Mainline456B MoE / 45.9B active10000K
INT4470.6
INT8730.9
FP161309.4
10x NVIDIA H200 141GBMinimum: 4x NVIDIA H200 141GB
DeepSeek-V3deepseek-ai
JSON
2024-12-25deepseekDeepSeek-V3Core / Mainline671B / A37B160K
INT4617.3
INT8997.6
FP161842.8
14x NVIDIA H200 141GBMinimum: 5x NVIDIA H200 141GB
JSON
2024-09-16qwenQwen2.5-7BInstruct7B128K
INT48
INT812
FP1620
RTX 4090 24GB / L40S 48GBMinimum: RTX 3060 12GB / L4 24GB