llama.cpp (Cortex)

Overview

Jan uses llama.cpp for running local AI models. You can find its settings in Settings () > Local Engine > llama.cpp:

llama.cpp

These settings are for advanced users, you would want to check these settings when:

Your AI models are running slowly or not working
You've installed new hardware (like a graphics card)
You want to tinker & test performance with different backends

Engine Version and Updates

Engine Version: View current version of llama.cpp engine
Check Updates: Verify if a newer version is available & install available updates when it's available

Available Backends

Jan offers different backend variants for llama.cpp based on your operating system, you can:

Download different backends as needed
Switch between backends for different hardware configurations
View currently installed backends in the list

⚠️

Choose the backend that matches your hardware. Using the wrong variant may cause performance issues or prevent models from loading.

CUDA Support (NVIDIA GPUs)

llama.cpp-avx-cuda-11-7
llama.cpp-avx-cuda-12-0
llama.cpp-avx2-cuda-11-7
llama.cpp-avx2-cuda-12-0
llama.cpp-avx512-cuda-11-7
llama.cpp-avx512-cuda-12-0
llama.cpp-noavx-cuda-11-7
llama.cpp-noavx-cuda-12-0

CPU Only

llama.cpp-avx
llama.cpp-avx2
llama.cpp-avx512
llama.cpp-noavx

Other Accelerators

llama.cpp-vulkan

For detailed hardware compatibility, please visit our guide for Windows.
AVX, AVX2, and AVX-512 are CPU instruction sets. For best performance, use the most advanced instruction set your CPU supports.
CUDA versions should match your installed NVIDIA drivers.

Settings TensorRT-LLM