Running Large Language Models Locally with Docker Model Runner
- Revanth Reddy Tondapu
- Jun 19
- 2 min read

Hello, tech enthusiasts! Today, we're diving into an exciting world where Docker meets AI to give you the power to run large language models (LLMs) right on your local machine. With Docker Model Runner, you can bypass the cloud and execute models with ease. Let's explore how you can leverage this tool for your development needs.
Why Run LLMs Locally?
Running LLMs locally offers several advantages:
Privacy: Keep sensitive data on your device without needing to send information over the internet.
Cost Efficiency: Save on cloud service costs by utilizing local resources.
Speed and Control: Prototype and test your models quickly without the latency of cloud interactions.
Key Features of Docker Model Runner
Local Execution: Models are pulled as OCI artifacts and run natively, ensuring efficient utilization of your system's resources.
OpenAI-Compatible API: The runner provides a familiar interface for developers already using OpenAI's API endpoints.
Hardware Acceleration: Optimized for Apple Silicon Macs and Windows systems with NVIDIA GPUs, ensuring fast inference.
Standard Packaging: Models are compatible with Docker Hub, facilitating seamless distribution and integration into CI/CD pipelines.
Step-by-Step Guide to Running LLMs Locally
1. Install & Enable Docker Model Runner
First, ensure you have Docker Desktop installed. Then, enable the model runner feature:
bashCopy
# For macOS with Apple Silicon
docker desktop enable model-runner
# Enable TCP for host access
docker desktop enable model-runner --tcp 12434
For Windows users, make sure your Docker Desktop version is 4.41 or higher and has GPU inference enabled.
2. Pull a Model
Choose a model to pull from the Docker Hub. For instance:
bashCopy
docker model pull ai/smollm2:360M-Q4_K_M
3. Run Models
Run a single prompt or engage in an interactive chat:
bashCopy
# Single prompt
docker model run ai/smollm2:360M-Q4_K_M "Give me a fact about whales."
# Interactive chat
docker model run -it ai/gemma3
4. Use via API
You can interact with models using HTTP requests, either from a container or your host machine:
bashCopy
# From a container
curl http://model-runner.docker.internal/.../chat/completions ...
# From host (after enabling TCP)
curl http://localhost:12434/engines/v1/chat/completions …
5. GUI Integration
Docker Desktop's GUI now includes a Models tab, where you can pull and run models without needing to use the command line.
Use Cases
App Development: Quickly prototype applications that require generative AI capabilities.
Data Science: Test models with local data and GPU acceleration.
Privacy-Conscious Workflows: Maintain data confidentiality by processing information locally.
Offline Scenarios: Ideal for environments without internet access.
Advanced Topics
Publishing Custom Models
You can publish your models as OCI artifacts:
bashCopy
docker model package --gguf ./model.gguf --push myorg/my-model
Integrations and Ecosystem
Docker Model Runner integrates with tools like Docker Compose and Testcontainers, making it a versatile choice for developers. The ecosystem is expanding, with contributions from major tech players like Google and Hugging Face.
Conclusion
Docker Model Runner is revolutionizing the way developers interact with LLMs by bringing them into the local development loop. Whether you're building applications, conducting research, or ensuring data privacy, Docker Model Runner provides a powerful toolset for running models efficiently on your machine. Happy coding!
Comments