Working with LLMs on AMDGPUs

Thu Nov 02 2023

This might only work for a few months (or even days), but after spending a few hours trying to get an open source LLMs to work on AMDGPUs inside Docker, I thought Iโ€™d share my findings. My GPU is an AMD 7900 XTX, and I was only able to make it work with the llama-cpp Python bindings. This should work for any ROCm supported AMDGPUs.

The first thing is to build and setup our Docker image. This is what I ended up with:

FROM rocm/dev-ubuntu-22.04:5.7-complete

# Environment variables
ENV CC=/opt/rocm/llvm/bin/clang
ENV CXX=/opt/rocm/llvm/bin/clang++

# Install pytorch and llama-cpp-python
RUN pip3 install --pre torch --index-url
RUN CMAKE_ARGS="-DLLAMA_HIPBLAS=1 -DAMDGPU_TARGETS=gfx1100" pip3 install llama-cpp-python --force-reinstall --upgrade --no-cache-dir

You might need to change gfx1100 to your GPUโ€™s family/target.

Next, we need to build the image:

docker build --no-cache -t amd-llm .

Now we can run the image with this complex precise command:

docker run -it --network=host --device=/dev/kfd \
    --device=/dev/dri --group-add=video --ipc=host \
    --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
    --entrypoint=bash \
    -v $(PWD):/models \

This will mount the current directory to /models inside the container and get you into a bash shell. Now is time to check if the Pytorch installation is working and able to detect the GPU. These commands should work:

import torch

print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
print(f"CUDA arch list: {torch.cuda.get_arch_list()}")
print(f"CUDNN available: {torch.backends.cudnn.is_available()}")
print(f"CUDNN version: {torch.backends.cudnn.version()}")

tensor = torch.randn(2, 2)
res =

If everything is working, you should see something like this:

Radeon RX 7900 XTX
CUDA available: True
CUDA version: None
CUDA arch list: ['gfx900', 'gfx906', 'gfx908', 'gfx90a', 'gfx1030', 'gfx1100']
CUDNN available: True
CUDNN version: 2020000

Now, letโ€™s do some LLMing and put those graphical processing units to work with one of the latest models, Mistral!

Download the model:


And with that, we should be ready to run the model with llama-cpp-python:

from llama_cpp import Llama

llm = Llama(

output = llm(
    "Q: Name the planets in the solar system. A: ",
    stop=["Q:", "\n"],


For me, it printed the following:

         "text":"Q: Name the planets in the solar system. A: 1. Mercury, 2. Venus, 3. Earth, 4. Mars, 5. Jupiter, 6. Saturn, 7. Uranus, 8. Neptune",

๐ŸŽ‰ ๐ŸŽ‰ ๐ŸŽ‰

If you, like me, are wondering if the GPU was actually being used, you can install nvtop and execute it.

GPU usage

Finally, after a few hours and a bunch of tweaks, the GPU was using and Mistral 7B worked on my machine!

โ† Back to home!