Step-by-Step Guide: Orchestrate Multiple Language Models
- Revanth Reddy Tondapu
- Aug 10, 2025
- 3 min read
This lab demonstrates how to set up your environment, load API keys, and make chat requests to a variety of Large Language Models (LLMs) using Python. You’ll learn the exact code cells, what each line does, and why it matters. By the end, you’ll be comfortable calling OpenAI’s GPT-4o-mini, Anthropic’s Claude, Google Gemini, DeepSeek, Groq, and a local Ollama model, then compare their responses..

1. Imports and Environment Setup
import os
import json
from dotenv import load_dotenv
from openai import OpenAI
from anthropic import Anthropic
from IPython.display import Markdown, displayos: Access environment variables
json: (Optional) Parse or format JSON data
load_dotenv: Load .env file containing API keys
OpenAI: Official OpenAI client library
Anthropic: Official Anthropic client library
Markdown, display: Render model responses nicely in Jupyter
# Load API keys from .env, overriding any existing environment variables
load_dotenv(override=True)Ensures your notebook picks up the latest key values every time.
2. Verifying API Keys
openai_key = os.getenv('OPENAI_API_KEY')
anthropic_key = os.getenv('ANTHROPIC_API_KEY')
google_key = os.getenv('GOOGLE_API_KEY')
deepseek_key = os.getenv('DEEPSEEK_API_KEY')
groq_key = os.getenv('GROQ_API_KEY')
# Print status for each key
print(f"OpenAI key: {'set' if openai_key else 'NOT SET'}")
print(f"Anthropic key: {'set' if anthropic_key else 'NOT SET'} (optional)")
print(f"Google key: {'set' if google_key else 'NOT SET'} (optional)")
print(f"DeepSeek key: {'set' if deepseek_key else 'NOT SET'} (optional)")
print(f"Groq key: {'set' if groq_key else 'NOT SET'} (optional)")Why? Quickly confirm which APIs you can call without runtime errors.
3. Crafting the Prompt
request = (
"Please come up with a challenging, nuanced question that I can ask "
"a number of LLMs to evaluate their intelligence. Answer only with "
"the question, no explanation."
)
messages = [{"role": "user", "content": request}]messages: A list of chat messages. We start with a single user message containing our request.
4. Having GPT-4o-mini Generate the Question
openai = OpenAI()
response = openai.chat.completions.create(
model="gpt-4o-mini",
messages=messages
)
question = response.choices[0].message.content
print(question)
Model: "gpt-4o-mini"—a cost-effective GPT-4 variant
Output: Stores the generated question in question for reuse.
5. Preparing to Collect Answers
competitors = []
answers = []
# Rebuild messages list using the generated question
messages = [{"role": "user", "content": question}]competitors: Track which model answered
answers: Store each model’s response
6. Asking GPT-4o-mini Its Own Question
model_name = "gpt-4o-mini"
response = openai.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content
display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)display(Markdown): Renders rich text output
We append the model name and its answer for later comparison.
7. Calling Anthropic’s Claude
claude = Anthropic()
model_name = "claude-3-7-sonnet-latest"
response = claude.messages.create(
model=model_name,
messages=messages,
max_tokens=1000
)
answer = response.content.text
display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)max_tokens: Limits Claude’s response length
The rest parallels the OpenAI call.
8. Using Google Gemini via OpenAI Client
gemini = OpenAI(
api_key=google_key,
base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)
model_name = "gemini-2.0-flash"
response = gemini.chat.completions.create(model=model_name, messages=messages)
answer = response.choices.message.content
display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)base_url: Points the OpenAI client at Google’s Gemini endpoint
Compatibility: Gemini supports the same chat format.
9. DeepSeek Chat Model
deepseek = OpenAI(
api_key=deepseek_key,
base_url="https://api.deepseek.com/v1"
)
model_name = "deepseek-chat"
response = deepseek.chat.completions.create(model=model_name, messages=messages)
answer = response.choices.message.content
display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)deepseek-chat: Full-size 671B-parameter chat model
Same client format as OpenAI.
10. Groq’s Llama-3.3 on High-Speed Hardware
groq = OpenAI(
api_key=groq_key,
base_url="https://api.groq.com/openai/v1"
)
model_name = "llama-3.3-70b-versatile"
response = groq.chat.completions.create(model=model_name, messages=messages)
answer = response.choices.message.content
display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)Groq: Ultra-fast inference for large models
Results appear in seconds.
11. Running a Local Model with Ollama
11.1 Install and Serve Ollama
# In terminal or Jupyter cell with !
ollama pull llama3.2
ollama servellama3.2: A 3 billion-parameter model suitable for local machines.
ollama serve: Launches a local server on http://localhost:11434.
11.2 Call the Local Endpoint
ollama = OpenAI(
api_key="ollama", # placeholder
base_url="http://localhost:11434/v1"
)
model_name = "llama3.2"
response = ollama.chat.completions.create(model=model_name, messages=messages)
answer = response.choices.message.content
display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)Important: Do not pull or use llama3.3 locally—it requires 60–100 GB of RAM and will crash most machines. Stick with llama3.2 or smaller.
12. Comparing All Model Responses
def compare_models(names, responses):
print("\n=== Model Comparison ===")
for name, resp in zip(names, responses):
print(f"\n{name} Response Snippet:")
print(resp[:200] + ("…" if len(resp) > 200 else ""))
print("-" * 40)
compare_models(competitors, answers)Prints the first 200 characters of each answer for side-by-side comparison.
Key Takeaways
One client, many endpoints: The OpenAI Python library can call OpenAI, Google Gemini, DeepSeek, and Groq by simply changing base_url.
Anthropic’s slight twist: Requires a max_tokens parameter and uses messages.create.
Local inference: Ollama brings open-source models to your desktop with an OpenAI-compatible API.
Resource caution: Always choose model sizes that fit your hardware, especially for local runs.
Consistent pattern: Build a list of messages, call each API, collect answers, and compare.
With these steps, you can confidently orchestrate multiple LLMs cloud and local within a single notebook. Experiment by adding more models, adjusting prompts, or integrating evaluation metrics to find the best combination for your application.



Comments