Running Large Language Models Locally with Docker Model Runner

Revanth Reddy Tondapu
Jun 19, 2025
2 min read

Hello, tech enthusiasts! Today, we're diving into an exciting world where Docker meets AI to give you the power to run large language models (LLMs) right on your local machine. With Docker Model Runner, you can bypass the cloud and execute models with ease. Let's explore how you can leverage this tool for your development needs.

Why Run LLMs Locally?

Running LLMs locally offers several advantages:

Privacy: Keep sensitive data on your device without needing to send information over the internet.
Cost Efficiency: Save on cloud service costs by utilizing local resources.
Speed and Control: Prototype and test your models quickly without the latency of cloud interactions.

Key Features of Docker Model Runner

Local Execution: Models are pulled as OCI artifacts and run natively, ensuring efficient utilization of your system's resources.
OpenAI-Compatible API: The runner provides a familiar interface for developers already using OpenAI's API endpoints.
Hardware Acceleration: Optimized for Apple Silicon Macs and Windows systems with NVIDIA GPUs, ensuring fast inference.
Standard Packaging: Models are compatible with Docker Hub, facilitating seamless distribution and integration into CI/CD pipelines.

Step-by-Step Guide to Running LLMs Locally

1. Install & Enable Docker Model Runner

First, ensure you have Docker Desktop installed. Then, enable the model runner feature:

bashCopy

# For macOS with Apple Silicon
docker desktop enable model-runner

# Enable TCP for host access
docker desktop enable model-runner --tcp 12434

For Windows users, make sure your Docker Desktop version is 4.41 or higher and has GPU inference enabled.

2. Pull a Model

Choose a model to pull from the Docker Hub. For instance:

bashCopy

docker model pull ai/smollm2:360M-Q4_K_M

3. Run Models

Run a single prompt or engage in an interactive chat:

bashCopy

# Single prompt
docker model run ai/smollm2:360M-Q4_K_M "Give me a fact about whales."

# Interactive chat
docker model run -it ai/gemma3

4. Use via API

You can interact with models using HTTP requests, either from a container or your host machine:

bashCopy

# From a container
curl http://model-runner.docker.internal/.../chat/completions ...

# From host (after enabling TCP)
curl http://localhost:12434/engines/v1/chat/completions …

5. GUI Integration

Docker Desktop's GUI now includes a Models tab, where you can pull and run models without needing to use the command line.

Use Cases

App Development: Quickly prototype applications that require generative AI capabilities.
Data Science: Test models with local data and GPU acceleration.
Privacy-Conscious Workflows: Maintain data confidentiality by processing information locally.
Offline Scenarios: Ideal for environments without internet access.

Advanced Topics

Publishing Custom Models

You can publish your models as OCI artifacts:

bashCopy

docker model package --gguf ./model.gguf --push myorg/my-model

Integrations and Ecosystem

Docker Model Runner integrates with tools like Docker Compose and Testcontainers, making it a versatile choice for developers. The ecosystem is expanding, with contributions from major tech players like Google and Hugging Face.

Conclusion

Docker Model Runner is revolutionizing the way developers interact with LLMs by bringing them into the local development loop. Whether you're building applications, conducting research, or ensuring data privacy, Docker Model Runner provides a powerful toolset for running models efficiently on your machine. Happy coding!

Running Large Language Models Locally with Docker Model Runner

Why Run LLMs Locally?

Key Features of Docker Model Runner

Step-by-Step Guide to Running LLMs Locally

1. Install & Enable Docker Model Runner

2. Pull a Model

3. Run Models

4. Use via API

5. GUI Integration

Use Cases

Advanced Topics

Publishing Custom Models

Integrations and Ecosystem

Conclusion

Recent Posts

Revanth Quick Learn