Step-by-Step Guide: Orchestrate Multiple Language Models

Revanth Reddy Tondapu
Aug 10, 2025
3 min read

This lab demonstrates how to set up your environment, load API keys, and make chat requests to a variety of Large Language Models (LLMs) using Python. You’ll learn the exact code cells, what each line does, and why it matters. By the end, you’ll be comfortable calling OpenAI’s GPT-4o-mini, Anthropic’s Claude, Google Gemini, DeepSeek, Groq, and a local Ollama model, then compare their responses..

1. Imports and Environment Setup

import os
import json
from dotenv import load_dotenv
from openai import OpenAI
from anthropic import Anthropic
from IPython.display import Markdown, display

os: Access environment variables
json: (Optional) Parse or format JSON data
load_dotenv: Load .env file containing API keys
OpenAI: Official OpenAI client library
Anthropic: Official Anthropic client library
Markdown, display: Render model responses nicely in Jupyter

# Load API keys from .env, overriding any existing environment variables
load_dotenv(override=True)

Ensures your notebook picks up the latest key values every time.

2. Verifying API Keys

openai_key    = os.getenv('OPENAI_API_KEY')
anthropic_key = os.getenv('ANTHROPIC_API_KEY')
google_key    = os.getenv('GOOGLE_API_KEY')
deepseek_key  = os.getenv('DEEPSEEK_API_KEY')
groq_key      = os.getenv('GROQ_API_KEY')

# Print status for each key
print(f"OpenAI key:    {'set' if openai_key else 'NOT SET'}")
print(f"Anthropic key: {'set' if anthropic_key else 'NOT SET'} (optional)")
print(f"Google key:    {'set' if google_key else 'NOT SET'} (optional)")
print(f"DeepSeek key:  {'set' if deepseek_key else 'NOT SET'} (optional)")
print(f"Groq key:      {'set' if groq_key else 'NOT SET'} (optional)")

Why? Quickly confirm which APIs you can call without runtime errors.

3. Crafting the Prompt

request = (
    "Please come up with a challenging, nuanced question that I can ask "
    "a number of LLMs to evaluate their intelligence. Answer only with "
    "the question, no explanation."
)
messages = [{"role": "user", "content": request}]

messages: A list of chat messages. We start with a single user message containing our request.

4. Having GPT-4o-mini Generate the Question

openai = OpenAI()
response = openai.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages
)
question = response.choices[0].message.content
print(question)

Model: "gpt-4o-mini"—a cost-effective GPT-4 variant
Output: Stores the generated question in question for reuse.

5. Preparing to Collect Answers

competitors = []
answers     = []
# Rebuild messages list using the generated question
messages = [{"role": "user", "content": question}]

competitors: Track which model answered
answers: Store each model’s response

6. Asking GPT-4o-mini Its Own Question

model_name = "gpt-4o-mini"
response = openai.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

display(Markdown): Renders rich text output
We append the model name and its answer for later comparison.

7. Calling Anthropic’s Claude

claude = Anthropic()
model_name = "claude-3-7-sonnet-latest"
response = claude.messages.create(
    model=model_name,
    messages=messages,
    max_tokens=1000
)
answer = response.content.text

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

max_tokens: Limits Claude’s response length
The rest parallels the OpenAI call.

8. Using Google Gemini via OpenAI Client

gemini = OpenAI(
    api_key=google_key,
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)
model_name = "gemini-2.0-flash"

response = gemini.chat.completions.create(model=model_name, messages=messages)
answer = response.choices.message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

base_url: Points the OpenAI client at Google’s Gemini endpoint
Compatibility: Gemini supports the same chat format.

9. DeepSeek Chat Model

deepseek = OpenAI(
    api_key=deepseek_key,
    base_url="https://api.deepseek.com/v1"
)
model_name = "deepseek-chat"

response = deepseek.chat.completions.create(model=model_name, messages=messages)
answer = response.choices.message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

deepseek-chat: Full-size 671B-parameter chat model
Same client format as OpenAI.

10. Groq’s Llama-3.3 on High-Speed Hardware

groq = OpenAI(
    api_key=groq_key,
    base_url="https://api.groq.com/openai/v1"
)
model_name = "llama-3.3-70b-versatile"

response = groq.chat.completions.create(model=model_name, messages=messages)
answer = response.choices.message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

Groq: Ultra-fast inference for large models
Results appear in seconds.

11. Running a Local Model with Ollama

11.1 Install and Serve Ollama

# In terminal or Jupyter cell with !
ollama pull llama3.2
ollama serve

llama3.2: A 3 billion-parameter model suitable for local machines.
ollama serve: Launches a local server on http://localhost:11434.

11.2 Call the Local Endpoint

ollama = OpenAI(
    api_key="ollama",  # placeholder
    base_url="http://localhost:11434/v1"
)
model_name = "llama3.2"

response = ollama.chat.completions.create(model=model_name, messages=messages)
answer = response.choices.message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

Important: Do not pull or use llama3.3 locally—it requires 60–100 GB of RAM and will crash most machines. Stick with llama3.2 or smaller.

12. Comparing All Model Responses

def compare_models(names, responses):
    print("\n=== Model Comparison ===")
    for name, resp in zip(names, responses):
        print(f"\n{name} Response Snippet:")
        print(resp[:200] + ("…" if len(resp) > 200 else ""))
        print("-" * 40)

compare_models(competitors, answers)

Prints the first 200 characters of each answer for side-by-side comparison.

Key Takeaways

One client, many endpoints: The OpenAI Python library can call OpenAI, Google Gemini, DeepSeek, and Groq by simply changing base_url.
Anthropic’s slight twist: Requires a max_tokens parameter and uses messages.create.
Local inference: Ollama brings open-source models to your desktop with an OpenAI-compatible API.
Resource caution: Always choose model sizes that fit your hardware, especially for local runs.
Consistent pattern: Build a list of messages, call each API, collect answers, and compare.

With these steps, you can confidently orchestrate multiple LLMs cloud and local within a single notebook. Experiment by adding more models, adjusting prompts, or integrating evaluation metrics to find the best combination for your application.