top of page

Step-by-Step Guide: Orchestrate Multiple Language Models

  • Writer: Revanth Reddy Tondapu
    Revanth Reddy Tondapu
  • Aug 10, 2025
  • 3 min read

This lab demonstrates how to set up your environment, load API keys, and make chat requests to a variety of Large Language Models (LLMs) using Python. You’ll learn the exact code cells, what each line does, and why it matters. By the end, you’ll be comfortable calling OpenAI’s GPT-4o-mini, Anthropic’s Claude, Google Gemini, DeepSeek, Groq, and a local Ollama model, then compare their responses..


Orchestrate Multiple Language Models
Orchestrate Multiple Language Models

1. Imports and Environment Setup

import os
import json
from dotenv import load_dotenv
from openai import OpenAI
from anthropic import Anthropic
from IPython.display import Markdown, display

  • os: Access environment variables

  • json: (Optional) Parse or format JSON data

  • load_dotenv: Load .env file containing API keys

  • OpenAI: Official OpenAI client library

  • Anthropic: Official Anthropic client library

  • Markdown, display: Render model responses nicely in Jupyter

# Load API keys from .env, overriding any existing environment variables
load_dotenv(override=True)
  • Ensures your notebook picks up the latest key values every time.


2. Verifying API Keys

openai_key    = os.getenv('OPENAI_API_KEY')
anthropic_key = os.getenv('ANTHROPIC_API_KEY')
google_key    = os.getenv('GOOGLE_API_KEY')
deepseek_key  = os.getenv('DEEPSEEK_API_KEY')
groq_key      = os.getenv('GROQ_API_KEY')

# Print status for each key
print(f"OpenAI key:    {'set' if openai_key else 'NOT SET'}")
print(f"Anthropic key: {'set' if anthropic_key else 'NOT SET'} (optional)")
print(f"Google key:    {'set' if google_key else 'NOT SET'} (optional)")
print(f"DeepSeek key:  {'set' if deepseek_key else 'NOT SET'} (optional)")
print(f"Groq key:      {'set' if groq_key else 'NOT SET'} (optional)")
  • Why? Quickly confirm which APIs you can call without runtime errors.


3. Crafting the Prompt

request = (
    "Please come up with a challenging, nuanced question that I can ask "
    "a number of LLMs to evaluate their intelligence. Answer only with "
    "the question, no explanation."
)
messages = [{"role": "user", "content": request}]

  • messages: A list of chat messages. We start with a single user message containing our request.


4. Having GPT-4o-mini Generate the Question

openai = OpenAI()
response = openai.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages
)
question = response.choices[0].message.content
print(question)
  • Model: "gpt-4o-mini"—a cost-effective GPT-4 variant

  • Output: Stores the generated question in question for reuse.


5. Preparing to Collect Answers

competitors = []
answers     = []
# Rebuild messages list using the generated question
messages = [{"role": "user", "content": question}]
  • competitors: Track which model answered

  • answers: Store each model’s response


6. Asking GPT-4o-mini Its Own Question

model_name = "gpt-4o-mini"
response = openai.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)
  • display(Markdown): Renders rich text output

  • We append the model name and its answer for later comparison.


7. Calling Anthropic’s Claude

claude = Anthropic()
model_name = "claude-3-7-sonnet-latest"
response = claude.messages.create(
    model=model_name,
    messages=messages,
    max_tokens=1000
)
answer = response.content.text

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)
  • max_tokens: Limits Claude’s response length

  • The rest parallels the OpenAI call.


8. Using Google Gemini via OpenAI Client

gemini = OpenAI(
    api_key=google_key,
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)
model_name = "gemini-2.0-flash"

response = gemini.chat.completions.create(model=model_name, messages=messages)
answer = response.choices.message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)
  • base_url: Points the OpenAI client at Google’s Gemini endpoint

  • Compatibility: Gemini supports the same chat format.


9. DeepSeek Chat Model

deepseek = OpenAI(
    api_key=deepseek_key,
    base_url="https://api.deepseek.com/v1"
)
model_name = "deepseek-chat"

response = deepseek.chat.completions.create(model=model_name, messages=messages)
answer = response.choices.message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)
  • deepseek-chat: Full-size 671B-parameter chat model

  • Same client format as OpenAI.


10. Groq’s Llama-3.3 on High-Speed Hardware

groq = OpenAI(
    api_key=groq_key,
    base_url="https://api.groq.com/openai/v1"
)
model_name = "llama-3.3-70b-versatile"

response = groq.chat.completions.create(model=model_name, messages=messages)
answer = response.choices.message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)
  • Groq: Ultra-fast inference for large models

  • Results appear in seconds.


11. Running a Local Model with Ollama

11.1 Install and Serve Ollama

# In terminal or Jupyter cell with !
ollama pull llama3.2
ollama serve

  • llama3.2: A 3 billion-parameter model suitable for local machines.

  • ollama serve: Launches a local server on http://localhost:11434.


11.2 Call the Local Endpoint

ollama = OpenAI(
    api_key="ollama",  # placeholder
    base_url="http://localhost:11434/v1"
)
model_name = "llama3.2"

response = ollama.chat.completions.create(model=model_name, messages=messages)
answer = response.choices.message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)
Important: Do not pull or use llama3.3 locally—it requires 60–100 GB of RAM and will crash most machines. Stick with llama3.2 or smaller.

12. Comparing All Model Responses

def compare_models(names, responses):
    print("\n=== Model Comparison ===")
    for name, resp in zip(names, responses):
        print(f"\n{name} Response Snippet:")
        print(resp[:200] + ("…" if len(resp) > 200 else ""))
        print("-" * 40)

compare_models(competitors, answers)
  • Prints the first 200 characters of each answer for side-by-side comparison.


Key Takeaways

  1. One client, many endpoints: The OpenAI Python library can call OpenAI, Google Gemini, DeepSeek, and Groq by simply changing base_url.

  2. Anthropic’s slight twist: Requires a max_tokens parameter and uses messages.create.

  3. Local inference: Ollama brings open-source models to your desktop with an OpenAI-compatible API.

  4. Resource caution: Always choose model sizes that fit your hardware, especially for local runs.

  5. Consistent pattern: Build a list of messages, call each API, collect answers, and compare.


With these steps, you can confidently orchestrate multiple LLMs cloud and local within a single notebook. Experiment by adding more models, adjusting prompts, or integrating evaluation metrics to find the best combination for your application.

Comments


bottom of page