Ben Hayes

Using Claude with the Anthropic API

An essential guide to understanding Anthropic Claude, from safe AI foundations to advanced workflows, agents, and practical applications.

Aug 18, 2025 22 min read Ben Hayes

Introduction
Using Claude and the Anthropic API
Conclusion

Update (Sept 2025): Anthropic has rebranded their API as the Claude API to help unify naming. The platform, API, docs, and others now route to www.claude.com. This article was written prior to the name change and reflects the names used at the time of writing.

Note: This article documents my experience learning about and using Claude with the Anthropic API. Many of the learnings come from publications by Anthropic's research team, the Anthropic Academy courses (Claude with the Anthropic API [8 hours] and Model Context Protocol: Advanced Topics [1 hour]) as well the API's documentation. You can find these elements and more linked after the conclusion.

Introduction

Artificial intelligence has rapidly evolved from experimental research into a practical tool that powers applications across industries. Among the new generation of AI systems, Anthropic’s Claude stands out as a model designed with a unique focus on safety, reliability, and usability. Named after Claude Shannon, the “father of information theory,” Claude combines cutting-edge language capabilities with Anthropic’s core principle of building AI that is helpful, honest, and harmless.

In this post, we’ll take a tour through the essentials of working with Claude. We’ll start with an overview of Anthropic and what makes Claude different, then explore how to access Claude via the API, evaluate and refine prompts, and apply prompt engineering techniques. From there, we’ll cover advanced capabilities such as tool use and the Model Context Protocol (MCP). Finally, we’ll look at how these pieces come together in agents and workflows to unlock powerful new use cases.

Whether you’re a developer experimenting with AI integration, a product manager exploring use cases, or simply curious about the technology, this guide will give you a clear foundation for getting started with Anthropic Claude.

Anthropic Overview

Anthropic is a public benefit corporation (PBC) whose stated purpose is the “responsible development and maintenance of advanced AI for the long-term benefit of humanity.” You can read more about Anthropic as an organization here. Currently, Anthropic is best known for their work in AI safety and research including their frontier model family, Claude.

Claude, Anthropic’s collection of large AI models, includes multiple flavors that are tested for personal and professional use. The top-level models include: Opus, Sonnet, and Haiku. We’ll leverage each of these throughout the examples in subsequent sections. The models can be summarized as described in the table below:

Model Name	Description	Relative Cost	Reasoning Capabilities
Opus	The largest model in the Claude family with the widest range of capabilities including advanced reasoning tasks and problem solving.	High	Supports reasoning (more about reasoning later)
Sonnet	The medium-sized model in the Claude family which balances speed, cost, and capability.	Medium	Supports reasoning
Haiku	The small-sized model in the Claude family optimized for speed and efficiency.	Low	Does not support reasoning

These models are accessible via the Anthropic API and provide capabilities ranging from text generation, translation, image analysis, and even advanced reasoning. With great, powerful models, comes great responsibility - and Anthropic has developed a reputation for AI safety and transparency. You can learn more at the Anthropic Transparency Hub linked here and after the conclusion.

For additional information or to stay up-to-date with Anthropic + Claude innovations, you can learn more here. Equipped with an overview of Anthropic and the Claude models, you can now proceed to the next sections to learn more about accessing these models through the Anthropic API.

Using Claude and the Anthropic API

Accessing Claude with the API

Accessing Claude through the Anthropic API is straightforward. We can interact via HTTP request or via SDK in one of these languages: Python, TypeScript, Java, Go, Ruby, and PHP (beta). The majority of examples to follow will use the Python SDK.

To submit a request via the API (either HTTP or SDK), we must include a handful of values to demonstrate who we are, that we are authorized to use the API, and what we want the API to return:

API Key: the magical string that tells Anthropic who you are and that you are authorized to use the API
Model: a string indicating which model you want to use (e.g., claude-opus-4-1-20250805 - see the model names here, and note that they change as models are under constant research and development)
Messages: the sequence of user and assistant messages (note that system prompts are included in a separate system parameter)
Max Tokens: the maximum length of the response in number of tokens

Let’s take a look at a simplified example where we interact with the Anthropic API. Note that we already begin to abstract code into functions here for re-use and readability. Unlike other APIs where there may be one-off exchanges of information, updates, etc., AI applications often feature multi-turn interactions. Thus, having reusable functions is just as critical for building AI apps.

Here we are creating an API client using the python SDK and defining a model name string which we can pass in each request to the Anthropic API.

1# Create an API client
2from anthropic import Anthropic
3
4client = Anthropic()
5model = "claude-sonnet-4-0"

Now, we define helper functions add_user_message(), add_assistant_message(), and chat(). These functions allow us to add a message from the user to the list of messages (i.e., the running conversation of requests and responses), add an assistant message, and invoke the client object to pass our messages to the API.

 1# Helper functions
 2def add_user_message(messages, text):
 3    user_message = {"role": "user", "content": text}
 4    messages.append(user_message)
 5
 6def add_assistant_message(messages, text):
 7    assistant_message = {"role": "assistant", "content": text}
 8    messages.append(assistant_message)
 9
10def chat(messages, system=None, temperature=1.0, stop_sequences=[]):
11    params = {
12        "model": model,
13        "max_tokens": 250,
14        "messages": messages,
15        "temperature": temperature,
16        "stop_sequences": stop_sequences
17    }
18
19    if system:
20        params["system"] = system
21
22    message = client.messages.create(**params)
23    return message.content[0].text

We finished the minimal set up and leverage our helper functions to call the API with an example request. As a user we ask Are iPhones better than Android smartphones?

1messages = []
2
3add_user_message(messages, "Are iPhones better than Android smartphones?")
4add_assistant_message(messages, "Android smartphones are better because")
5answer = chat(messages)
6
7answer

We gave Claude a headstart on the response because we included an assistant message as the last message. Claude completed the statement by providing reasons why Android may be preferred over iPhones.

1" they're cheaper, they have much more customization, and you're not locked into Apple's ecosystem. Also, Apple

Optional Parameters

The Anthropic API affords a few optional parameters that help us control the content generated by Claude and the response from the API. A few examples we should call out are:

System prompts allow us to shape how Claude responds to our messages including style/tone or even specific instructions/steps to execute. We’ll use these more in the next sections.
Temperature is a value between 0 and 1 that alters the randomness of responses. With a temperature closer to 0, Claude tends to choose the most probable next token. With a temperature closer to 1, the underlying probability distribution shifts to favor otherwise less probable tokens.
Streaming allows us to provide updates to the user by processing chunks as they are generated by Claude and the Anthropic API - instead of having to wait 5-20 seconds for the full response.
Stop Sequences tell Claude to stop generating additional parts of the response when a certain character or phrase is reached. This can be useful if we want to catch bad quality indicators or to format the response.

Now that we can connect to the Anthropic API to send messages, and receive responses, we can dive deeper into the techniques that guide the output. We’ll need to look at systematically testing our prompts to ensure our AI application is high quality, safe, and otherwise adhering to our guidelines and expectations.

Prompt Evaluation & Engineering Techniques

We want to build AI applications that provide users with a valuable outcome (e.g., update their account information, retrieve information about an online order, or generate ideas relevant to a specific topic). Users must trust the application adheres to their request (pertinence or on-topicness) and does not miscontrue facts or reality (hallucination). To ensure this at a production-level we will employ both prompt engineering and prompt evaluation.

The benefits of prompt engineering and evaluation include programmatic and repeatable prompting, maintaining a historical view of model performance for iteration, and ability to leverage version control for prompts. Our evaluation dataset will contain a list of questions or prompts that we want Claude to respond or react to.

To implement a framework for prompt evaluation, we will develop a workflow that has the following steps. Note that this process forms a cycle of iteration:

Step	Description
Create/modify prompt	Use our intuition or feedback from grades to write and/or update the model prompt.
Generate an evaluation dataset	Use AI to generate a list of questions that would serve as example questions for evaluation.
Request responses from Claude and the Anthropic API	For each question in the evaluation dataset, ask Claude to generate a response.
Grade the responses	Use AI to automatically grade the responses and provide handful of criteria or dimensions to return scores.

Prompt Evaluation

Initially, our prompt may start out quite simple - see the example below. Over time we will refine the prompt. Of course, we can make the initial prompt more detailed and follow established best practices (e.g., using direct, clear, and concise language; providing structure such as brackets or parenthesis; and providing examples or formats for outputs). For illustrative purposes however, let’s take a look at a simple prompt.

1prompt = f"""
2Please generate a travel plan that incorporates constraints or preferences from the user input:
3
4{test_case["user_input"]}
5"""

Above, we instruct the model to incorporate constraints and preferences from the user’s input. The output should be a travel plan that is tailored to the user. Notice how we haven’t specified values to look for in the input (e.g., location) nor specified how to format the travel plan.

Prompt Engineering

We discuss techniques above for prompt engineering but this topic deserves additional explanation. Prompt engineering is the process of crafting and revising prompts to perform better toward a certain objective or metric. We have many subtle ways to alter the prompt which give the model clues, hints, starting points, and references to improve its response. Here are a few examples that will typically improve the responses:

Direct instructions afford the model with clear descriptions of what task to perform. Instead of saying “you may need to respond to a question from a user”, clearly instruct the AI model to “Respond to the user’s question with…”.
Conciseness ensures there is less ambiguity of what is important within a prompt. Additional language scattered throughout the prompt may help the author feel as though there is clarity but ultimately muddy the request to the model. Focus on what is critical for the model to understand.
Examples are often very helpful for models to understand what types of requests to expect. Similar to the adage to “show, not tell”, examples clarify what to expect.
Steps provide the model with guidelines on what to perform and help breakdown more complicated tasks into smaller ones. Enumerating the subtasks needed to complete a task make it easier to see how the model approaches the tasks.
Response expectations give Claude a clear idea of what is expected. Examples include providing structured output such as {“count”: “”}. These guidelines help the model understand what format or syntax is required in the output.

If you do not want to build the plumbing to perform repeatable prompt evaluation and prompt engineering, you can leverage the capabilities at the Anthropic Developer Platform. You can learn more about the evaluation tool from Anthropic here.

Tool Use with Claude

We interact with various tools throughout our day. When we wake up we may snooze an alarm, make coffee, and turn on our laptop. We may check our email, look up our schedule for the day, and respond to friends’ messages. We may drive a car or pay for public transit. Tools make our lives easier and help us interact with large, complex systems.

The same applies to AI systems. One constraint limiting AI systems today is small context lengths/windows. The context that an AI system has may include user requests, system prompts, documents or images in a vector database. Generally speaking, tool use is very similar - expansion of the context in which an AI system can operate.

We want AI systems to understand the other systems that we use on a regular basis. This could include email clients, calendars, itineraries in travel systems, databases with sales and order data, etc. Tool use by models can further center AI systems in our applications. Here are a few quick examples of tool use for both consumer and commercial applications:

Scenario	How Tools Help
Retrieve details from online applications/databases	AI systems can reply to user questions relatively easily. However, accessing user-specific data in a database for that user such as pending orders, upcoming deliveries, recent bills or statements requires tool use.
Perform tedious tasks	Applications such as email clients or calendars require a lot of manual effort to maintain (the same spirit of "this meeting could have been an email"). An AI model can assist by creating an event for you based on a simple voice or text prompt.
Gather additional context	Since AI models are trained on a corpus at a specific time, therefore lacking knowledge of subsequent events, providing a web search tool helps AI models gain context on recent events or news without retraining or fine-tuning.

Before we dive into an example of using a tool, let’s ensure we understand how the Anthropic API and Claude will exchange tool availability, tool use decisions, and tool results.

The exchange between your application and Claude for tool use follows a clear, multi-step pattern. You provide tool definitions alongside the user message. Claude then decides whether to call a tool — if so, it returns a tool_use content block with a stop_reason of "tool_use". Your code executes the tool locally and sends back a tool_result message. Claude then generates its final response incorporating the tool output.

 1Turn 1:  User message + tool definitions
 2         ↓
 3         Claude decides to use a tool
 4         ↓
 5Turn 2:  tool_use block returned
 6         (stop_reason: "tool_use")
 7         ↓
 8         Your code executes the tool locally
 9         ↓
10Turn 3:  tool_result message sent back
11         ↓
12         Claude generates final text response

Let’s take a look at defining a tool and providing it to the model. Each tool definition requires a name, a description that helps Claude understand when to use it, and a JSON schema describing the inputs the tool expects.

Here we define a get_weather tool and pass it to Claude alongside a user question. When Claude determines the tool is needed, it returns a tool_use block. We execute the function locally, then send the result back so Claude can compose the final answer.

 1tools = [
 2    {
 3        "name": "get_weather",
 4        "description": "Get current weather for a given city.",
 5        "input_schema": {
 6            "type": "object",
 7            "properties": {
 8                "location": {
 9                    "type": "string",
10                    "description": "City and state, e.g. Austin, TX"
11                }
12            },
13            "required": ["location"]
14        }
15    }
16]
17
18def get_weather(location):
19    # In practice, call a real weather API here
20    return f"The weather in {location} is 82 degrees F and partly cloudy."
21
22def run_tool(name, inputs):
23    if name == "get_weather":
24        return get_weather(inputs["location"])
25
26messages = [{"role": "user",
27             "content": "What's the weather like in Austin?"}]
28response = client.messages.create(
29    model=model, max_tokens=1024,
30    tools=tools, messages=messages
31)
32
33# If Claude wants to use a tool, execute it and continue
34if response.stop_reason == "tool_use":
35    tool_block = next(
36        b for b in response.content if b.type == "tool_use"
37    )
38    result = run_tool(tool_block.name, tool_block.input)
39
40    messages.append({"role": "assistant",
41                     "content": response.content})
42    messages.append({
43        "role": "user",
44        "content": [{
45            "type": "tool_result",
46            "tool_use_id": tool_block.id,
47            "content": result
48        }]
49    })
50    final = client.messages.create(
51        model=model, max_tokens=1024,
52        tools=tools, messages=messages
53    )
54    print(final.content[0].text)

Claude correctly identified the tool it needed, extracted the location from the user's message, called the tool, and used the result to compose a natural-language answer.

1The weather in Austin, TX is currently 82 degrees F and partly cloudy.

Retrieval-Augmented Generation (RAG)

Large language models are trained on a snapshot of the world at a specific point in time and do not have access to private or proprietary information. Retrieval-Augmented Generation (RAG) addresses both of these limitations by retrieving relevant documents at query time and injecting them into the prompt as additional context.

A typical RAG pipeline involves three stages:

Stage	Description
Index	Documents are split into chunks, converted to dense vector embeddings, and stored in a vector database (e.g., Pinecone, Weaviate, or pgvector).
Retrieve	When a user submits a query, it is embedded using the same model and the vector database is searched for the most semantically similar chunks.
Generate	The retrieved chunks are formatted and injected into the Claude prompt as context. Claude uses this grounded information to generate an accurate, factual response.

In this simplified example, we simulate retrieval by returning a static context block. In production, the retrieve() function would embed the user query and search a vector database for the most relevant document chunks.

 1def retrieve(query):
 2    # In practice, embed query and search vector DB
 3    return (
 4        "Q3 2025 earnings: revenue was $4.2B, up 18% YoY. "
 5        "Operating margin improved to 24% from 21% in Q3 2024."
 6    )
 7
 8user_question = "How did revenue trend in Q3 2025?"
 9context = retrieve(user_question)
10
11rag_prompt = f"""Use the context below to answer the question.
12Only use information present in the context.
13
14Context:
15{context}
16
17Question: {user_question}
18"""
19
20messages = [{"role": "user", "content": rag_prompt}]
21answer = chat(messages)
22print(answer)

Because we instructed Claude to answer only from the provided context, the response is grounded in our retrieved document rather than in general training knowledge — reducing the risk of hallucination.

1Based on the provided context, Q3 2025 revenue was $4.2 billion,
2an 18% increase year-over-year. Operating margin also improved,
3rising from 21% to 24% compared to Q3 2024.

A few best practices to keep in mind when building RAG pipelines: keep chunks small enough to be relevant but large enough to preserve necessary context (200–500 tokens is a reasonable starting range); instruct Claude to answer only from the provided context and to acknowledge when the answer is not available; and consider returning source citations so users can verify the information.

Model Context Protocol (MCP)

As the number of tools and data sources an AI system needs grows, managing each integration individually becomes unwieldy. Every new tool requires custom connection logic, authentication handling, and protocol conventions that must be maintained alongside the core application. The Model Context Protocol (MCP) was introduced by Anthropic in late 2024 to solve this problem by defining a standard, open protocol for connecting AI models to external tools, data sources, and services.

Think of MCP as a universal adapter for AI integrations: instead of building a bespoke connection for every external system, you agree on a common interface and everything becomes interoperable. This means an MCP server built for one AI host can be reused with any other compatible host — including Claude Desktop, custom applications, or third-party tools.

How MCP Works

MCP defines three roles in every integration:

MCP Host: The application running the AI model (e.g., Claude Desktop, your custom app). The host manages connections to one or more MCP servers.
MCP Client: A component within the host that speaks the MCP protocol on behalf of the model.
MCP Server: A lightweight process that exposes capabilities to the model through a standardized interface.

MCP servers can expose three types of capabilities:

Tools: Callable functions the model can invoke (similar to the function/tool calling we covered earlier, but now discoverable via a standard protocol).
Resources: Readable data sources such as files, database records, or API responses that provide context to the model.
Prompts: Reusable, parameterized prompt templates that can be invoked by name.

Configuring MCP Servers

The most common entry point for MCP is Claude Desktop, which reads a configuration file that lists the servers it should connect to at startup. Once a server is configured, Claude automatically discovers and can use its exposed tools without any additional code on your part.

This configuration connects Claude Desktop to two MCP servers: one that provides access to the local filesystem and one that connects to a PostgreSQL database. Each entry specifies the command to launch the server process along with any required arguments or environment variables.

 1{
 2  "mcpServers": {
 3    "filesystem": {
 4      "command": "npx",
 5      "args": [
 6        "-y",
 7        "@modelcontextprotocol/server-filesystem",
 8        "/path/to/project"
 9      ]
10    },
11    "postgres": {
12      "command": "npx",
13      "args": ["-y", "@modelcontextprotocol/server-postgres"],
14      "env": {
15        "POSTGRES_CONNECTION_STRING": "postgresql://localhost/mydb"
16      }
17    }
18  }
19}

With this configuration in place, Claude can list files, read file contents, query the database, and more — all without any custom integration code. This is the core value proposition of MCP: server authors write the integration once, and any MCP-compatible host can use it.

MCP is particularly valuable in enterprise settings where Claude needs access to many internal systems — CRMs, ticketing platforms, internal wikis — without each team building and maintaining a unique integration. A growing catalog of pre-built MCP servers is available at the MCP servers repository. Stay tuned for a deeper dive into MCP coming soon.

Orchestrating Claude with Agents & Workflows

So far, every example we have looked at involves a single exchange: a user sends a message and Claude responds. But many real-world tasks require multiple steps, decisions, and tool invocations before arriving at a final answer. This is the domain of agents and workflows.

Workflows vs. Agents

It is useful to distinguish between these two related concepts:

Workflows are predetermined sequences of steps. The developer defines the flow in advance — Claude runs step A, then step B, then step C. Workflows are predictable, easy to debug, and appropriate for well-defined tasks.
Agents give Claude more autonomy. Rather than following a fixed script, an agent uses tools in a loop, deciding at each step what action to take next based on the current state. Agents are well-suited for open-ended tasks but require careful design to be reliable and safe.

Common Orchestration Patterns

Pattern	Description	Best For
Prompt chaining	Output of one Claude call feeds as input to the next.	Multi-stage transformations, document summarization pipelines.
Routing	A classifier model directs requests to a specialized model or prompt.	Customer support triage, intent detection.
Parallelization	Multiple Claude calls run concurrently and results are aggregated.	Evaluating multiple responses, parallel research tasks.
Tool-use loop	Claude iteratively calls tools until it can answer the original question.	Research agents, data analysis, code execution environments.
Human-in-the-loop	Claude pauses and requests confirmation before taking irreversible actions.	Any task involving writes, deletions, or external side effects.

Building a Simple Agent Loop

The agentic loop is the fundamental building block of autonomous AI systems. In each iteration, Claude receives the current conversation state including available tools, decides what to do next, and either calls a tool or returns a final response.

This agent loop runs until Claude returns a final end_turn response or we reach a maximum iteration limit. On each iteration, we check whether Claude has requested a tool call. If so, we execute it and feed the result back into the conversation. If not, we break out of the loop and return the final answer.

 1MAX_ITERATIONS = 10
 2
 3def run_agent(user_message, tools, tool_executor):
 4    messages = [{"role": "user", "content": user_message}]
 5
 6    for _ in range(MAX_ITERATIONS):
 7        response = client.messages.create(
 8            model=model,
 9            max_tokens=4096,
10            tools=tools,
11            messages=messages
12        )
13
14        # No tool call — Claude is done
15        if response.stop_reason == "end_turn":
16            return response.content[0].text
17
18        # Claude wants to use one or more tools
19        if response.stop_reason == "tool_use":
20            messages.append({
21                "role": "assistant",
22                "content": response.content
23            })
24            tool_results = []
25            for block in response.content:
26                if block.type == "tool_use":
27                    result = tool_executor(block.name, block.input)
28                    tool_results.append({
29                        "type": "tool_result",
30                        "tool_use_id": block.id,
31                        "content": result
32                    })
33            messages.append({
34                "role": "user",
35                "content": tool_results
36            })
37
38    return "Max iterations reached without a final answer."

Design Considerations

Building reliable agents requires more than just the loop above. A few principles to keep in mind:

Constrain the action space: give agents only the tools they need for the task at hand. A broader toolset increases the risk of unintended actions.
Log every step: record each tool call, its inputs, and its outputs. This makes debugging significantly easier and enables auditing after the fact.
Set iteration limits: unconstrained loops can run indefinitely and accumulate API costs. Always cap the number of iterations.
Prefer human-in-the-loop for irreversible actions: before Claude sends an email, deletes a record, or charges a customer, pause and confirm with the user. The cost of an extra confirmation step is far lower than the cost of an unintended side effect.
Handle errors gracefully: tools fail. Design your tool execution layer to return informative error messages that Claude can reason about, rather than crashing the loop entirely.

Conclusion

We have covered a lot of ground in this post — from Anthropic’s mission and the Claude model family, to the fundamentals of the Anthropic API, prompt engineering, and prompt evaluation. From there, we explored how to extend Claude’s capabilities with tool use and RAG, standardize integrations with MCP, and orchestrate complex, multi-step behavior with agents and workflows.

The thread connecting all of these topics is the idea that Claude is not just a chatbot but a building block. Each capability we explored — tool use, retrieval, MCP, agents — adds a new dimension to what you can build. The most powerful applications tend to combine several of these: an agent that retrieves context from a vector database, calls external APIs through MCP-connected servers, and checks in with a human before taking consequential actions.

If you are just getting started, a good first project is a simple RAG-based Q&A system over a document you care about. Once you are comfortable with that, layer in tool use to give Claude the ability to look up live data or write to an external system. From there, the jump to agentic workflows is a natural next step.

The Anthropic ecosystem is evolving quickly. New model versions, expanded context windows, multimodal capabilities, and a growing MCP server ecosystem all mean that the ceiling on what you can build continues to rise. Staying close to the documentation and experimenting regularly is the best way to keep pace. Check out the additional resources below for more information.

Blog

About

Contact

Resume/CV