Aller au contenu principal

Function Calling with the Gemini API

LOGO DXT

Welcome back! In the previous chapters, we covered the basics of the Gemini API and how to leverage its multimodal capabilities. Now, we're going to dive into one of the most powerful and exciting features: Function Calling. This feature transforms Gemini from a simple text-generation engine into a dynamic, programmable agent that can interact with the real world.

What is Function Calling?

At its core, function calling allows you to connect the Gemini API to external tools and services. Instead of just generating a text response, the model can now intelligently determine when a specific function would be useful and provide the necessary parameters to execute it.

Think of it this way: the model becomes the "brain" of your application, and your functions become its "hands" and "eyes." The model can understand a user's intent, decide which action to take, and hand off the task to your code. Your code then performs the action (e.g., fetching data from a database, calling a third-party API, or sending an email) and returns the result to the model, which then uses that result to formulate a final, natural-language response.

This process is a game-changer for building sophisticated applications that can:

  • Get up-to-date, real-time information: Fetch the current weather, stock prices, or news headlines.
  • Perform actions on a user's behalf: Book a flight, schedule a meeting, or place an order.
  • Access private data: Look up information in a customer database or a personal calendar.

The Four-Step Process

Function calling isn't a single API call, but rather a structured interaction between your application and the model. Here's how it works:

  1. You Define the Tools: First, you need to tell Gemini what tools it has at its disposal. You do this by creating a function declaration. This is a structured object (often in JSON Schema format) that describes the function's name, its purpose, and the parameters it requires.
  2. The User Makes a Request: The user sends a prompt to your application. This prompt, along with your defined function declarations, is then sent to the Gemini model.
  3. The Model Responds with a Function Call: The model analyzes the user's intent. If it decides that one of your functions would be the best way to fulfill the request, it will not generate a text response. Instead, it will return a structured JSON object containing the name of the function to call and the arguments to use.
  4. Your Application Executes the Function: This is the most important part—the model does not execute the function itself! Your application is responsible for parsing the model's response, identifying the function call, executing the corresponding code with the provided arguments, and then sending the result back to the model.

A Practical Example: Building an E-commerce Tool

Let's put this into practice with a Python example. We'll build a simple mock e-commerce tool that can look up a product's price and stock availability. This example uses the google.generativeai library, which simplifies the process of creating function declarations.

import os
import requests
import json
import google.generativeai as genai

# Step 1: Define the functions that our tool can use.
# In a real application, these would call a database or an external API.

def get_product_price(product_id: str) -> dict:
    """
    Retrieves the current price of a product by its ID.

    Args:
        product_id (str): The unique identifier for the product (e.g., 'A123').

    Returns:
        dict: A dictionary containing the product ID, price, and currency.
    """
    # This is a simplified mock implementation for demonstration purposes.
    product_prices = {
        "A123": {"price": 129.99, "currency": "USD"},
        "B456": {"price": 49.50, "currency": "USD"},
        "C789": {"price": 250.00, "currency": "USD"},
    }
    price_data = product_prices.get(product_id)
    if price_data:
        return {"product_id": product_id, **price_data}
    else:
        return {"error": f"Product ID '{product_id}' not found."}

def get_product_stock(product_id: str) -> dict:
    """
    Retrieves the current stock level for a product by its ID.

    Args:
        product_id (str): The unique identifier for the product (e.g., 'A123').

    Returns:
        dict: A dictionary with the product ID and the number of items in stock.
    """
    # Mock implementation of a stock lookup.
    product_stock = {
        "A123": {"stock": 150},
        "B456": {"stock": 0}, # Out of stock
        "C789": {"stock": 25},
    }
    stock_data = product_stock.get(product_id)
    if stock_data:
        return {"product_id": product_id, **stock_data}
    else:
        return {"error": f"Product ID '{product_id}' not found."}

# The `genai.Tool` class automatically creates the function declarations
# from our Python function definitions.
e_commerce_tool = genai.Tool(
    function_declarations=[
        genai.FunctionDeclaration.from_callable(get_product_price),
        genai.FunctionDeclaration.from_callable(get_product_stock),
    ]
)

# Step 2: Configure the model with the tools and start a chat session.
# The API key will be provided automatically in the Canvas environment
genai.configure(api_key="") 
model = genai.GenerativeModel("gemini-2.5-flash-preview-05-20", tools=[e_commerce_tool])
chat = model.start_chat()

# Step 3: Send a user prompt that requires a tool.
user_prompt = "What's the stock level for product B456?"
print(f"User: {user_prompt}\n")
response = chat.send_message(user_prompt)

# A `try...except` block is crucial for handling cases where the model
# might not return a function call.
try:
    # Check if the model's response contains a function call.
    function_call_part = response.candidates[0].content.parts[0]
    function_call = function_call_part.function_call
    
    print("Model wants to call a function!")
    print(f"Function Name: {function_call.name}")
    print(f"Arguments: {function_call.args}\n")

    # Step 4: Execute the function and return the result to the model.
    if function_call.name in globals():
        # Call the local Python function using the arguments from the model.
        result = globals()[function_call.name](**function_call.args)
        
        # Send the function's output back to the model for a final response.
        final_response = chat.send_message(
            genai.Part.from_function_response(
                name=function_call.name,
                response={"content": json.dumps(result)}
            )
        )
        
        print("Final response from the model:")
        print(final_response.text)
    else:
        print(f"Error: Function '{function_call.name}' not found in global scope.")

except Exception as e:
    print("The model did not return a function call or an error occurred:", e)
    # If no function call, print the model's direct response.
    print(f"Model's direct response: {response.text}")

Best Practices for Function Calling

  • Be Descriptive: Give your function declarations clear, human-readable descriptions. The model relies on these descriptions to understand what the function does and when to call it.
  • Use Strong Typing: Clearly define the data types for your parameters (e.g., str, int, list). This helps the model accurately extract the correct information from the user's prompt.
  • Handle Errors Gracefully: Always include try...except blocks to handle unexpected responses or errors in your function's execution. A robust application should be able to recover and inform the user if something goes wrong.
  • Filter and Summarize Results: The data returned from an API can be very verbose. Before sending it back to the model, filter it down to just the essential information. The model can then use this concise data to generate a better, more focused response.
  • Consider Multi-Turn Conversations: Function calling often happens within a conversational context. Libraries and SDKs that manage chat history are invaluable for maintaining a fluid, multi-turn interaction with the user.