What is prompt engineering?

Prompt engineering is the practice of crafting inputs to AI language models to produce accurate, useful, and reliable outputs. It involves choosing the right words, structure, context, and format to guide the AI toward the response you actually need — rather than a generic or off-target one.

Which AI models benefit most from better prompting?

All major large language models — including ChatGPT (GPT-4o), Claude, and Gemini — respond significantly to prompt quality. The same task can produce dramatically different results depending on how you structure your request. Better prompting improves output across every major model.

Do I need technical skills to do prompt engineering?

No. Prompt engineering is done in natural language — you write text instructions, not code. Basic prompting needs no technical background at all. Advanced techniques like prompt chaining or agentic workflows can benefit from light scripting knowledge, but the core skill is clear written communication.

Where can I learn more about prompt engineering?

MasterPrompting.net offers a structured curriculum from beginner to advanced, covering every major technique from basic clarity and context to chain-of-thought, meta-prompting, and agentic workflows. Start with the Beginner track to build a solid foundation.

What Is smolagents? HuggingFace's Lightweight Agent Library Explained

Most agent frameworks give the LLM a list of tools and let it call them one at a time. smolagents does something different: it gives the LLM a Python interpreter and lets it write code to solve problems. That sounds like a small difference. It isn't.

HuggingFace released smolagents in late 2024 as a deliberate reaction to the complexity of LangChain and LangGraph. The README says it out loud: "simple agents that work." The entire library is around 1,000 lines of core code. You can read all of it in an afternoon.

The core idea: CodeAgent vs ToolCallingAgent

smolagents has two agent types. The ToolCallingAgent behaves like most other agent frameworks — the LLM selects from a list of tools, calls one, gets the result, decides what to do next. Classic ReAct prompting loop.

The CodeAgent is what makes smolagents interesting. Instead of selecting tools, the LLM writes Python code that solves the task. That code gets executed in a sandboxed interpreter, and the output feeds back into the model's context. The LLM reasons by programming.

Here's what that looks like in practice. Given the task "What's the population of France divided by the area of Germany?", a ToolCallingAgent would call a search tool twice and then do the math. A CodeAgent might write:

france_population = 68_170_000
germany_area_km2 = 357_114

result = france_population / germany_area_km2
print(f"Population density ratio: {result:.2f} people per km²")

And execute it directly. No tool calls. Just code.

This approach handles multi-step reasoning differently. If the model needs to process a list of 50 items, it can write a loop. If it needs to transform data before passing it to the next step, it does that in code. The LLM isn't constrained to calling predefined functions — it can compose arbitrary Python logic.

Getting started

Installation is minimal:

pip install smolagents

A basic CodeAgent with no external tools:

from smolagents import CodeAgent, HfApiModel

model = HfApiModel("meta-llama/Llama-3.3-70B-Instruct")
agent = CodeAgent(tools=[], model=model)

result = agent.run("What is 17 factorial? Show the calculation step by step.")
print(result)

Adding tools follows a simple decorator pattern:

from smolagents import tool, CodeAgent, HfApiModel

@tool
def get_weather(city: str) -> str:
    """Get current weather for a city.
    
    Args:
        city: The city name to get weather for.
    """
    # Your actual API call here
    return f"Sunny, 22°C in {city}"

model = HfApiModel("meta-llama/Llama-3.3-70B-Instruct")
agent = CodeAgent(tools=[get_weather], model=model)

result = agent.run("What's the weather in Paris and Tokyo? Which is warmer?")

The docstring matters here — smolagents uses it to tell the model what the tool does. Write clear docstrings.

Using it with different models

smolagents works with HuggingFace Hub models out of the box. You can also use it with any model that has an OpenAI-compatible API:

from smolagents import CodeAgent, LiteLLMModel

# Works with Claude, GPT-4o, Gemini, or any OpenAI-compatible endpoint
model = LiteLLMModel("anthropic/claude-sonnet-4-5")
agent = CodeAgent(tools=[], model=model)

For local models, the integration with Ollama works well:

from smolagents import CodeAgent, LiteLLMModel

model = LiteLLMModel(
    model_id="ollama/qwen2.5-coder:7b",
    api_base="http://localhost:11434",
)
agent = CodeAgent(tools=[], model=model)

This is a real advantage over some frameworks — you're not locked into specific API providers, and the local model story is first-class.

Built-in tools

smolagents ships with a few ready-to-use tools you can import directly:

from smolagents import CodeAgent, HfApiModel
from smolagents.tools import DuckDuckGoSearchTool, PythonInterpreterTool

agent = CodeAgent(
    tools=[DuckDuckGoSearchTool(), PythonInterpreterTool()],
    model=HfApiModel("meta-llama/Llama-3.3-70B-Instruct"),
    add_base_tools=True,  # Adds basic file reading, web fetching
)

result = agent.run(
    "Search for the top 3 Python agent frameworks in 2026 and compare their GitHub stars"
)

DuckDuckGoSearchTool does what it says. PythonInterpreterTool gives the CodeAgent an explicit interpreter tool (though CodeAgent already executes code natively — this is more useful in ToolCallingAgent).

How multi-step reasoning works

The agent loop in smolagents is transparent. You can see it working:

The model receives the task and generates Python code
The code runs in a sandboxed interpreter
The output (stdout, return values) goes back into the model's context
The model decides if the task is done or generates more code
Repeat until the task is complete or max steps is reached

You can set max_steps to control how long it can run:

agent = CodeAgent(
    tools=[search_tool],
    model=model,
    max_steps=10,  # Default is 6
    verbose=True,  # Prints each step
)

With verbose=True you see every code block the model generates and every output it receives. This is invaluable for debugging. Most frameworks make this harder to inspect.

Understanding the sandboxed execution

The code execution is sandboxed — the model can't import arbitrary modules by default. You control what's available:

agent = CodeAgent(
    tools=[],
    model=model,
    additional_authorized_imports=["pandas", "numpy", "json"],
)

This is a security consideration worth thinking about. If you're running smolagents in a web application, be deliberate about what the model can import. The default allowlist is conservative.

smolagents vs LangChain/LangGraph

The comparison that comes up most often: why use smolagents when LangChain exists?

LangChain's strength is its ecosystem — hundreds of integrations, extensive documentation, and tooling built around it. Its weakness is complexity. A simple agent in LangChain involves chains, prompts, output parsers, callbacks, and memory objects. There's a lot of framework between you and the thing that's actually running.

smolagents removes most of that. There's no chain concept, no complex memory management, no output parsers. The tool is a Python function with a docstring. The agent is CodeAgent(tools=..., model=...). If it breaks, you can read the source code to understand why.

LangGraph adds a state graph on top of LangChain, which is valuable for production workflows with complex branching. smolagents doesn't have an equivalent — it's not trying to compete on that dimension. For tasks that need stateful graphs with checkpointing and conditional routing, use LangGraph. For everything else, smolagents is often faster to work with.

The function calling lesson covers how the standard tool-calling pattern works underneath both frameworks — understanding that makes it clearer when the CodeAgent approach is an advantage.

When to use smolagents

Good fit:

Prototypes and research experiments where you want results today
Tasks involving data processing, calculation, or file manipulation where Python code is natural
Working with local or open-source models
Teams that value readable, debuggable code over extensive abstraction
One-shot or few-shot tasks that don't require persistent state across sessions

Not a good fit:

Production systems that need checkpointing, retry logic, and fault tolerance
Complex multi-agent orchestration with dependency graphs between agents
Long-running workflows that span multiple sessions
Teams that need the extensive integrations LangChain provides

My workflow: I prototype with smolagents, understand the shape of the problem, then decide if it needs to move to LangGraph for production. Often it doesn't. The prototype turns out to be good enough, and the simplicity of smolagents becomes an asset rather than a limitation.

The HuggingFace team keeps iterating on it quickly — it's one of the faster-moving libraries in the space right now. Worth keeping an eye on.

The core idea: CodeAgent vs ToolCallingAgent

france_population = 68_170_000
germany_area_km2 = 357_114

result = france_population / germany_area_km2
print(f"Population density ratio: {result:.2f} people per km²")

And execute it directly. No tool calls. Just code.

Getting started

Installation is minimal:

pip install smolagents

A basic CodeAgent with no external tools:

from smolagents import CodeAgent, HfApiModel

model = HfApiModel("meta-llama/Llama-3.3-70B-Instruct")
agent = CodeAgent(tools=[], model=model)

result = agent.run("What is 17 factorial? Show the calculation step by step.")
print(result)

Adding tools follows a simple decorator pattern:

from smolagents import tool, CodeAgent, HfApiModel

@tool
def get_weather(city: str) -> str:
    """Get current weather for a city.
    
    Args:
        city: The city name to get weather for.
    """
    # Your actual API call here
    return f"Sunny, 22°C in {city}"

model = HfApiModel("meta-llama/Llama-3.3-70B-Instruct")
agent = CodeAgent(tools=[get_weather], model=model)

result = agent.run("What's the weather in Paris and Tokyo? Which is warmer?")

The docstring matters here — smolagents uses it to tell the model what the tool does. Write clear docstrings.

Using it with different models

smolagents works with HuggingFace Hub models out of the box. You can also use it with any model that has an OpenAI-compatible API:

from smolagents import CodeAgent, LiteLLMModel

# Works with Claude, GPT-4o, Gemini, or any OpenAI-compatible endpoint
model = LiteLLMModel("anthropic/claude-sonnet-4-5")
agent = CodeAgent(tools=[], model=model)

For local models, the integration with Ollama works well:

from smolagents import CodeAgent, LiteLLMModel

model = LiteLLMModel(
    model_id="ollama/qwen2.5-coder:7b",
    api_base="http://localhost:11434",
)
agent = CodeAgent(tools=[], model=model)

This is a real advantage over some frameworks — you're not locked into specific API providers, and the local model story is first-class.

Built-in tools

smolagents ships with a few ready-to-use tools you can import directly:

from smolagents import CodeAgent, HfApiModel
from smolagents.tools import DuckDuckGoSearchTool, PythonInterpreterTool

agent = CodeAgent(
    tools=[DuckDuckGoSearchTool(), PythonInterpreterTool()],
    model=HfApiModel("meta-llama/Llama-3.3-70B-Instruct"),
    add_base_tools=True,  # Adds basic file reading, web fetching
)

result = agent.run(
    "Search for the top 3 Python agent frameworks in 2026 and compare their GitHub stars"
)

How multi-step reasoning works

The agent loop in smolagents is transparent. You can see it working:

The model receives the task and generates Python code
The code runs in a sandboxed interpreter
The output (stdout, return values) goes back into the model's context
The model decides if the task is done or generates more code
Repeat until the task is complete or max steps is reached

You can set max_steps to control how long it can run:

agent = CodeAgent(
    tools=[search_tool],
    model=model,
    max_steps=10,  # Default is 6
    verbose=True,  # Prints each step
)

With verbose=True you see every code block the model generates and every output it receives. This is invaluable for debugging. Most frameworks make this harder to inspect.

Understanding the sandboxed execution

The code execution is sandboxed — the model can't import arbitrary modules by default. You control what's available:

agent = CodeAgent(
    tools=[],
    model=model,
    additional_authorized_imports=["pandas", "numpy", "json"],
)

This is a security consideration worth thinking about. If you're running smolagents in a web application, be deliberate about what the model can import. The default allowlist is conservative.

smolagents vs LangChain/LangGraph

The comparison that comes up most often: why use smolagents when LangChain exists?

The function calling lesson covers how the standard tool-calling pattern works underneath both frameworks — understanding that makes it clearer when the CodeAgent approach is an advantage.

When to use smolagents

Good fit:

Prototypes and research experiments where you want results today
Tasks involving data processing, calculation, or file manipulation where Python code is natural
Working with local or open-source models
Teams that value readable, debuggable code over extensive abstraction
One-shot or few-shot tasks that don't require persistent state across sessions

Not a good fit:

Production systems that need checkpointing, retry logic, and fault tolerance
Complex multi-agent orchestration with dependency graphs between agents
Long-running workflows that span multiple sessions
Teams that need the extensive integrations LangChain provides

The HuggingFace team keeps iterating on it quickly — it's one of the faster-moving libraries in the space right now. Worth keeping an eye on.

What Is smolagents? HuggingFace's Lightweight Agent Library Explained

The core idea: CodeAgent vs ToolCallingAgent

Getting started

Using it with different models

Built-in tools

How multi-step reasoning works

Understanding the sandboxed execution

smolagents vs LangChain/LangGraph

When to use smolagents

Related articles

Async Python for LLM Apps — Patterns That Actually Work in Production

Build a Vector Store for RAG — FAISS vs Chroma vs Pinecone (With Code)

Claude API vs OpenAI API — Developer Comparison Guide (2026)

What Is smolagents? HuggingFace's Lightweight Agent Library Explained

The core idea: CodeAgent vs ToolCallingAgent

Getting started

Using it with different models

Built-in tools

How multi-step reasoning works

Understanding the sandboxed execution

smolagents vs LangChain/LangGraph

When to use smolagents

Related articles

Async Python for LLM Apps — Patterns That Actually Work in Production

Build a Vector Store for RAG — FAISS vs Chroma vs Pinecone (With Code)

Claude API vs OpenAI API — Developer Comparison Guide (2026)