Home / Tools / Ollama Model Compatibility Calculator: Which Models Can You Run?

Ollama Model Compatibility Calculator: Which Models Can You Run?

Ollama CLI Cheat Sheet: Every Command You Need

Use this free Ollama model compatibility calculator to find out which AI models your hardware can run. Select whether you have a GPU (VRAM) or are running CPU-only, enter your available memory, and the table instantly shows which models run well, which are a tight fit, and which are too large for your hardware.

Ollama Model Compatibility Calculator

Find out which models you can run based on your available RAM or VRAM

8 GB
CPU-only mode: inference will be significantly slower, especially for models above 7B parameters. A GPU is strongly recommended for a usable experience.
Model Parameters Quantisation Req. Memory Status

How to use this calculator

Select GPU mode if you are running Ollama with a dedicated graphics card — in this case, the relevant figure is your GPU’s VRAM (not your system RAM). Select CPU mode if you are running Ollama without a GPU, in which case models load into your system RAM.

Move the slider to your available VRAM or RAM. Models marked “Runs well” will load and run comfortably at Q4_K_M quantisation. “Tight fit” means the model will likely load but may leave little headroom for the operating system. “Too large” means the model will not fit in memory at this quantisation level.

What is Q4_K_M quantisation?

Quantisation reduces the precision of a model’s weights to make it smaller and faster to run. Q4_K_M is the most common default in Ollama — it offers a good balance between model quality and memory efficiency, with only a small reduction in output quality compared to the full-precision version. If you have more VRAM available, Q8 gives better results at roughly double the memory cost. FP16 (full precision) is the highest quality but requires the most memory.

For a full guide to running models locally, see: Ollama guides on Serverman.