LLM Model Reference
Review parameter size, context window, recommended GPU, and deployment characteristics.
LLM Model Reference
Filter by series, family, variant, and parameter scale for fast model comparison before deployment.
VRAM is an engineering estimate. A ⚠ marker means the listed estimate is below the weight-only theoretical floor and should be rechecked.
| Family | Variant | VRAM estimate (GB) | Recommended GPU | |||||
|---|---|---|---|---|---|---|---|---|
MiMo-V2.5-ProXiaomiMiMo Tool Calling | 2026-04-27 | xiaomi | MiMo-V2.5-Pro | Core / Mainline | 1020B / A42B | 1024K | INT4857.6 INT81434.4 FP162716.3 | 20x NVIDIA H200 141GBMinimum: 7x NVIDIA H200 141GB |
DeepSeek-V4-Prodeepseek-ai Tool Calling | 2026-04-22 | deepseek | DeepSeek-V4-Pro | Core / Mainline | 1600B / A49B | 1024K | INT41256.6 INT82160 FP164167.5 | 30x NVIDIA H200 141GBMinimum: 9x NVIDIA H200 141GB |
DeepSeek-V4-Flashdeepseek-ai Tool Calling | 2026-04-22 | deepseek | DeepSeek-V4-Flash | Core / Mainline | 284B / A13B | 1024K | INT4200.4 INT8361.1 FP16718.3 | 6x NVIDIA H200 141GBMinimum: 2x NVIDIA H200 141GB |
Qwen3.6-27BQwen Multimodal | 2026-04-21 | qwen | Qwen3.6-27B | Core / Mainline | 27B | 256K | INT424 INT838 FP1672 | A100 80GB / H100 80GBMinimum: L40S 48GB / A100 80GB |
Qwen3.6-35B-A3BQwen Multimodal | 2026-04-15 | qwen | Qwen3.6-35B-A3B | Core / Mainline | 35B MoE / 3B active | 256K | INT418 INT832 FP1678 | A100 80GB / H100 80GBMinimum: L40S 48GB / A100 80GB |
MiniMax-M2.7MiniMaxAI Tool CallingJSON | 2026-04-09 | minimax | MiniMax-M2.7 | Core / Mainline | 229B MoE / 10B active | 200K | INT4311.2 INT8440.4 FP16728.3 | 6x NVIDIA H200 141GBMinimum: 3x NVIDIA H200 141GB |
GLM-5.1zai-org Tool Calling | 2026-04-03 | zai | GLM-5.1 | Core / Mainline | 754B MoE / 40B active | 198K | INT4563.9 INT8990.8 FP161939.6 | 25x H100 80GBMinimum: 8x H100 80GB |
Qwen3.5-2BQwen Multimodal | 2026-02-28 | qwen | Qwen3.5-2B | Core / Mainline | 2B | 256K | INT41.6 INT83 FP166 | RTX 5090 32GB / L20 48GBMinimum: RTX 4090 24GB |
Qwen3.5-0.8BQwen Multimodal | 2026-02-28 | qwen | Qwen3.5-0.8B | Core / Mainline | 0.8B | 256K | INT40.6 INT81.2 FP162.4 | RTX 5090 32GB / L20 48GBMinimum: RTX 4090 24GB |
Qwen3.5-9BQwen Multimodal | 2026-02-27 | qwen | Qwen3.5-9B | Core / Mainline | 9B | 128K | INT46.5 INT812.4 FP1624.8 | RTX 5090 32GB / L20 48GBMinimum: RTX 4090 24GB |
Qwen3.5-4BQwen Multimodal | 2026-02-27 | qwen | Qwen3.5-4B | Core / Mainline | 4B | 128K | INT42.9 INT85.5 FP1611 | RTX 5090 32GB / L20 48GBMinimum: RTX 4090 24GB |
Multimodal | 2026-02-24 | qwen | Qwen3.5-122B-A10B | Core / Mainline | 122B-A10B | 256K | INT495.3 INT8183 FP16366 | 3x NVIDIA H200 141GBMinimum: NVIDIA H200 141GB |
Qwen3.5-35B-A3BQwen Multimodal | 2026-02-24 | qwen | Qwen3.5-35B-A3B | Core / Mainline | 35B-A3B | 256K | INT427.3 INT852.5 FP16105 | H100 80GB / H200 141GBMinimum: L40S 48GB / A100 80GB |
Qwen3.5-27BQwen Multimodal | 2026-02-24 | qwen | Qwen3.5-27B | Core / Mainline | 27B | 256K | INT421.1 INT840.5 FP1681 | A100 80GB / H100 80GBMinimum: L40S 48GB |
Multimodal | 2026-02-16 | qwen | Qwen3.5-397B-A17B | Core / Mainline | 397B-A17B | 256K | INT4310.2 INT8595.5 FP161191 | 9x NVIDIA H200 141GBMinimum: 3x NVIDIA H200 141GB |
WebWorld-32BQwen | 2026-02-13 | qwen | WebWorld-32B | Core / Mainline | 32B | 40K | INT433.2 INT852.6 FP1695.8 | NVIDIA H200 141GBMinimum: NVIDIA RTX A6000 48GB / NVIDIA A40 48GB |
WebWorld-14BQwen | 2026-02-13 | qwen | WebWorld-14B | Core / Mainline | 14B | 40K | INT416.9 INT825.2 FP1644.1 | NVIDIA RTX A6000 48GB / NVIDIA A40 48GBMinimum: NVIDIA RTX A6000 48GB / NVIDIA TITAN RTX 24GB |
WebWorld-8BQwen | 2026-02-13 | qwen | WebWorld-8B | Core / Mainline | 8B | 40K | INT412.4 INT816.7 FP1627.4 | NVIDIA RTX A6000 48GB / NVIDIA A40 48GBMinimum: NVIDIA RTX A6000 48GB / GeForce RTX 4060 Ti 16GB |
MiniMax-M2.5MiniMaxAI Tool CallingJSON | 2026-02-12 | minimax | MiniMax-M2.5 | Core / Mainline | 229B MoE / 10B active | 192K | INT4311.2 INT8440.4 FP16728.3 | 6x NVIDIA H200 141GBMinimum: 3x NVIDIA H200 141GB |
MiniCPM-SALAOpenBMB Tool CallingJSON | 2026-02-11 | openbmb | MiniCPM-SALA | Core / Mainline | 9B | 512K | INT412 INT818 FP1630 | L40S 48GB / A100 80GBMinimum: RTX 4090 24GB / L40S 48GB |
GLM-5zai-org Tool Calling | 2026-02-11 | zai | GLM-5 | Core / Mainline | 744B / A40B | 198K | INT4557.4 INT8979 FP161915.8 | 14x NVIDIA H200 141GBMinimum: 4x NVIDIA H200 141GB |
Step-3.5-Flashstepfun-ai Tool CallingJSON | 2026-02-01 | stepfun | Step-3.5-Flash | Core / Mainline | 196B MoE / 11B active | 256K | INT4288.6 INT8399.4 FP16646.3 | 5x NVIDIA H200 141GBMinimum: 3x NVIDIA H200 141GB |
Qwen3-Coder-NextQwen Tool CallingJSON | 2026-01-30 | qwen | Qwen3-Coder-Next | Core / Mainline | 80B-A3B | 256K | INT462.5 INT8120 FP16240 | -Minimum: - |
GLM-4.7-Flashzai-org Tool CallingJSON | 2026-01-19 | zai | GLM-4.7-Flash | Core / Mainline | 30B-A3B | 198K | INT424.3 INT840.4 FP1676.1 | L40S 48GB / H100 80GBMinimum: RTX 5090 32GB / L40S 48GB |
GLM-4.7zai-org Tool CallingJSON | 2025-12-22 | zai | GLM-4.7 | Core / Mainline | 355B MoE / 32B active | 198K | INT4404.8 INT8611.1 FP161069.8 | 8x NVIDIA H200 141GBMinimum: 3x NVIDIA H200 141GB |
MiniMax-M2.1MiniMaxAI Tool Calling | 2025-12-20 | minimax | MiniMax-M2.1 | Core / Mainline | 229B MoE | 192K | INT4171.9 INT8305.1 FP16600.9 | 5x NVIDIA H200 141GBMinimum: 2x NVIDIA H200 141GB |
MiMo-V2-FlashXiaomiMiMo Tool CallingJSON | 2025-12-16 | xiaomi | MiMo-V2-Flash | Core / Mainline | 309B / A15B | 256K | INT4366.3 INT8541.3 FP16930.1 | 7x NVIDIA H200 141GBMinimum: 3x NVIDIA H200 141GB |
DeepSeek-V3.2deepseek-ai Tool CallingJSON | 2025-12-01 | deepseek | DeepSeek-V3.2 | Core / Mainline | 671B MoE / 37B active | 160K | INT4617.3 INT8997.6 FP161842.8 | 14x NVIDIA H200 141GBMinimum: 5x NVIDIA H200 141GB |
DeepSeek-V3.2-Specialedeepseek-ai Tool CallingJSON | 2025-11-28 | deepseek | DeepSeek-V3.2-Speciale | Core / Mainline | 671B MoE / 37B active | 160K | INT4617.3 INT8997.6 FP161842.8 | 14x NVIDIA H200 141GBMinimum: 5x NVIDIA H200 141GB |
Kimi-K2-Thinkingmoonshotai Tool CallingJSON | 2025-11-04 | moonshot | Kimi-K2 | Thinking | 1100B MoE / 32B active | 256K | INT4911.4 INT81532.3 FP162912.2 | 21x NVIDIA H200 141GBMinimum: 7x NVIDIA H200 141GB |
Kimi-Linear-48B-A3B-Instructmoonshotai Tool CallingJSON | 2025-10-30 | moonshot | Kimi-Linear-48B-A3B | Instruct | 48B MoE / 3B active | 1024K | INT4119.6 INT8146.6 FP16206.5 | 2x NVIDIA H200 141GBMinimum: NVIDIA H200 141GB |
MiniMax-M2MiniMaxAI Tool CallingJSON | 2025-10-22 | minimax | MiniMax-M2 | Core / Mainline | 229B MoE / 10B active | 192K | INT4311.2 INT8440.4 FP16728.3 | 6x NVIDIA H200 141GBMinimum: 3x NVIDIA H200 141GB |
GLM-4.6zai-org Tool CallingJSON | 2025-09-29 | zai | GLM-4.6 | Core / Mainline | 355B MoE / 32B active | 198K | INT4258.9 INT8459.9 FP16906.7 | 12x H100 80GBMinimum: 4x H100 80GB |
DeepSeek-V3.2-Expdeepseek-ai Tool CallingJSON | 2025-09-29 | deepseek | DeepSeek-V3.2-Exp | Core / Mainline | 671B / A37B | 160K | INT4522 INT8902.3 FP161747.5 | 13x NVIDIA H200 141GBMinimum: 4x NVIDIA H200 141GB |
| 2025-09-09 | qwen | Qwen3-Next-80B-A3B | Thinking | 80B-A3B | 256K | INT462.5 INT8120 FP16240 | -Minimum: - | |
Tool CallingJSON | 2025-09-09 | qwen | Qwen3-Next-80B-A3B | Instruct | 80B-A3B | 256K | INT462.5 INT8120 FP16240 | -Minimum: - |
Kimi-K2-Instruct-0905moonshotai Tool CallingJSON | 2025-09-03 | moonshot | Kimi-K2 | Instruct | 1000B MoE / 32B active | 256K | INT4842.7 INT81407.5 FP162662.4 | 19x NVIDIA H200 141GBMinimum: 6x NVIDIA H200 141GB |
DeepSeek-V3.1deepseek-ai Tool CallingJSON | 2025-08-21 | deepseek | DeepSeek-V3.1 | Core / Mainline | 671B MoE / 37B active | 125K | INT4617.3 INT8997.6 FP161842.8 | 14x NVIDIA H200 141GBMinimum: 5x NVIDIA H200 141GB |
Tool CallingJSON | 2025-08-05 | qwen | Qwen3-4B | Thinking | 4B | 256K | INT43.1 INT86 FP1612 | -Minimum: - |
Tool CallingJSON | 2025-08-05 | qwen | Qwen3-4B | Instruct | 4B | 256K | INT43.1 INT86 FP1612 | -Minimum: - |
Tool CallingJSON | 2025-07-31 | qwen | Qwen3-Coder-30B-A3B | Instruct | 30B-A3B | 256K | INT423.4 INT845 FP1690 | -Minimum: - |
Tool CallingJSON | 2025-07-29 | qwen | Qwen3-30B-A3B | Thinking | 30B-A3B | 256K | INT423.4 INT845 FP1690 | -Minimum: - |
Tool CallingJSON | 2025-07-28 | qwen | Qwen3-30B-A3B | Instruct | 30B-A3B | 256K | INT423.4 INT845 FP1690 | -Minimum: - |
Tool CallingJSON | 2025-07-25 | qwen | Qwen3-235B-A22B | Thinking | 235B-A22B | 256K | INT4183.6 INT8352.5 FP16705 | -Minimum: - |
Tool CallingJSON | 2025-07-22 | qwen | Qwen3-Coder-480B-A35B | Instruct | 480B-A35B | 256K | INT4375 INT8720 FP161440 | -Minimum: - |
Tool CallingJSON | 2025-07-21 | qwen | Qwen3-235B-A22B | Instruct | 235B-A22B | 256K | INT4183.6 INT8352.5 FP16705 | -Minimum: - |
GLM-4.5-Airzai-org Tool Calling | 2025-07-20 | zai | GLM-4.5-Air | Core / Mainline | 106B MoE / 12B active | 128K | INT484.4 INT8146.5 FP16285.7 | 4x H100 80GBMinimum: 2x H100 80GB |
GLM-4.5zai-org Tool Calling | 2025-07-20 | zai | GLM-4.5 | Core / Mainline | 355B MoE / 32B active | 128K | INT4258.9 INT8459.9 FP16906.7 | 12x H100 80GBMinimum: 4x H100 80GB |
Kimi-K2-Instructmoonshotai Tool CallingJSON | 2025-07-11 | moonshot | Kimi-K2 | Instruct | 1000B / A32B | 128K | INT4747.4 INT81312.2 FP162567.2 | 19x NVIDIA H200 141GBMinimum: 6x NVIDIA H200 141GB |
MiniMax-M1-80k-hfMiniMaxAI Tool CallingJSON | 2025-07-01 | minimax | MiniMax-M1-80k-hf | Core / Mainline | 456B MoE / 45.9B active | 1,000,000 | INT4470.6 INT8730.9 FP161309.4 | 10x NVIDIA H200 141GBMinimum: 4x NVIDIA H200 141GB |
MiniMax-Text-01-hfMiniMaxAI Tool CallingJSON | 2025-06-03 | minimax | MiniMax-Text-01-hf | Core / Mainline | 456B MoE / 45.9B active | 1,000,000 | INT4470.6 INT8730.9 FP161309.4 | 10x NVIDIA H200 141GBMinimum: 4x NVIDIA H200 141GB |
MiMo-7B-RL-0530XiaomiMiMo Tool CallingJSON | 2025-05-30 | xiaomi | MiMo-7B | RL | 8B | 64K | INT46 INT810 FP1616 | RTX 4090 24GB / L40S 48GBMinimum: RTX 4060 Ti 16GB / RTX 4090 24GB |
DeepSeek-R1-0528-Qwen3-8Bdeepseek-ai Tool Calling | 2025-05-29 | deepseek | DeepSeek-R1-0528-Qwen3-8B | Core / Mainline | 8B | 128K | INT45 INT89.6 FP1619.2 | -Minimum: - |
DeepSeek-R1-0528deepseek-ai Tool Calling | 2025-05-28 | deepseek | DeepSeek-R1 | Core / Mainline | 685B / A37B | 160K | INT4531.6 INT8919.8 FP161782.4 | 13x NVIDIA H200 141GBMinimum: 4x NVIDIA H200 141GB |
Qwen3-235B-A22BQwen Tool CallingJSON | 2025-04-27 | qwen | Qwen3-235B-A22B | Core / Mainline | 235B-A22B | 40K | INT4146.9 INT8282 FP16564 | -Minimum: - |
Qwen3-32BQwen Tool CallingJSON | 2025-04-27 | qwen | Qwen3-32B | Core / Mainline | 32B | 40K | INT420 INT838.4 FP1676.8 | -Minimum: - |
Qwen3-30B-A3BQwen Tool CallingJSON | 2025-04-27 | qwen | Qwen3-30B-A3B | Core / Mainline | 30B-A3B | 40K | INT418.8 INT836 FP1672 | -Minimum: - |
Qwen3-14BQwen Tool CallingJSON | 2025-04-27 | qwen | Qwen3-14B | Core / Mainline | 14B | 40K | INT48.8 INT816.8 FP1633.6 | L40S 48GB / A100 80GBMinimum: RTX 4090 24GB |
Qwen3-8BQwen Tool CallingJSON | 2025-04-27 | qwen | Qwen3-8B | Core / Mainline | 8B | 40K | INT45 INT89.6 FP1619.2 | -Minimum: - |
Qwen3-4BQwen Tool CallingJSON | 2025-04-27 | qwen | Qwen3-4B | Core / Mainline | 4B | 40K | INT42.5 INT84.8 FP169.6 | -Minimum: - |
Qwen3-1.7BQwen Tool CallingJSON | 2025-04-27 | qwen | Qwen3-1.7B | Core / Mainline | 1.7B | 40K | INT41.1 INT82 FP164.1 | -Minimum: - |
Qwen3-0.6BQwen Tool CallingJSON | 2025-04-27 | qwen | Qwen3-0.6B | Core / Mainline | 0.6B | 40K | INT40.4 INT80.7 FP161.4 | -Minimum: - |
GLM-Z1-Rumination-32B-0414zai-org Tool CallingJSON | 2025-04-13 | zai | GLM-Z1-Rumination-32B | Core / Mainline | 32B | 128K | INT432.8 INT852.2 FP1695.4 | 2x H100 80GBMinimum: L40S 48GB / A6000 48GB |
GLM-Z1-32B-0414zai-org Tool CallingJSON | 2025-04-08 | zai | GLM-Z1-32B | Core / Mainline | 32B | 32K | INT426.8 INT846.2 FP1689.3 | 2x H100 80GBMinimum: L40S 48GB / A6000 48GB |
GLM-Z1-9B-0414zai-org Tool CallingJSON | 2025-04-08 | zai | GLM-Z1-9B | Core / Mainline | 9B | 32K | INT49.7 INT814.5 FP1626.6 | L40S 48GB / A6000 48GBMinimum: RTX 3060 12GB / RTX 4090 24GB |
GLM-4-32B-0414zai-org Tool CallingJSON | 2025-04-07 | zai | GLM-4-32B | Core / Mainline | 32B | 32K | INT424 INT839 FP1676 | 2x L40S 48GB / A100 80GBMinimum: RTX 4090 24GB / L40S 48GB |
GLM-4-9B-0414zai-org Tool CallingJSON | 2025-04-07 | zai | GLM-4-9B | Core / Mainline | 9B | 32K | INT49 INT812 FP1622 | RTX 4090 24GB / L40S 48GBMinimum: RTX 4090 24GB / L4 24GB |
DeepSeek-V3-0324deepseek-ai Tool CallingJSON | 2025-03-24 | deepseek | DeepSeek-V3 | Core / Mainline | 684.53B | 160K | INT4798.5 INT81405 FP162752.9 | 35x H100 80GBMinimum: 10x H100 80GB |
Moonlight-16B-A3B-Instructmoonshotai Tool Calling | 2025-02-22 | moonshot | Moonlight-16B-A3B | Instruct | 16B | 8K | INT414.6 INT823.8 FP1644.7 | L40S 48GB / A6000 48GBMinimum: RTX 4090 24GB / L4 24GB |
Moonlight-16B-A3Bmoonshotai Tool Calling | 2025-02-22 | moonshot | Moonlight-16B-A3B | Core / Mainline | 16B | 8K | INT414.6 INT823.8 FP1644.7 | L40S 48GB / A6000 48GBMinimum: RTX 4090 24GB / L4 24GB |
DeepSeek-R1deepseek-ai | 2025-01-20 | deepseek | DeepSeek-R1 | Core / Mainline | 685B / A37B | 160K | INT4531.6 INT8919.8 FP161782.4 | 13x NVIDIA H200 141GBMinimum: 4x NVIDIA H200 141GB |
DeepSeek-R1-Distill-Qwen-32Bdeepseek-ai Tool CallingJSON | 2025-01-20 | deepseek | DeepSeek-R1-Distill-Qwen-32B | Core / Mainline | 32B | 128K | INT430 INT844 FP1682 | A100 80GB / H100 80GBMinimum: L40S 48GB / A100 80GB |
DeepSeek-R1-Distill-Qwen-14Bdeepseek-ai Tool CallingJSON | 2025-01-20 | deepseek | DeepSeek-R1-Distill-Qwen-14B | Core / Mainline | 14B | 128K | INT414 INT824 FP1638 | L40S 48GB / A100 80GBMinimum: RTX 4090 24GB / L40S 48GB |
DeepSeek-R1-Distill-Llama-70Bdeepseek-ai Tool CallingJSON | 2025-01-20 | deepseek | DeepSeek-R1-Distill-Llama-70B | Core / Mainline | 70B | 128K | INT452 INT892 FP16170 | 2x A100 80GB / 2x H100 80GBMinimum: A100 80GB / H100 80GB |
MiniMax-Text-01MiniMaxAI Tool CallingJSON | 2025-01-12 | minimax | MiniMax-Text-01 | Core / Mainline | 456B MoE / 45.9B active | 10000K | INT4470.6 INT8730.9 FP161309.4 | 10x NVIDIA H200 141GBMinimum: 4x NVIDIA H200 141GB |
DeepSeek-V3deepseek-ai JSON | 2024-12-25 | deepseek | DeepSeek-V3 | Core / Mainline | 671B / A37B | 160K | INT4617.3 INT8997.6 FP161842.8 | 14x NVIDIA H200 141GBMinimum: 5x NVIDIA H200 141GB |
JSON | 2024-09-16 | qwen | Qwen2.5-7B | Instruct | 7B | 128K | INT48 INT812 FP1620 | RTX 4090 24GB / L40S 48GBMinimum: RTX 3060 12GB / L4 24GB |