IntermediateFeatured
LlamaIndex Agentic RAG
Connecting data-aware agents to your private knowledge base using advanced indexing.
25 min read
LlamaIndexOpenAI
Learn how to build intelligent agents that can reason over your private data using LlamaIndex's powerful RAG capabilities.
What is Agentic RAG?#
Agentic RAG combines the retrieval power of RAG with the reasoning capabilities of AI agents. Instead of simple query-response patterns, agents can:
- Plan multi-step retrieval strategies
- Combine information from multiple sources
- Reason about retrieved context
- Take actions based on findings
Installation#
pip install llama-index llama-index-agent-openai
pip install llama-index-vector-stores-chromaBuilding Your Knowledge Base#
Document Loading#
from llama_index.core import SimpleDirectoryReader
# Load documents from a directory
documents = SimpleDirectoryReader(
input_dir="./data",
recursive=True,
filename_as_id=True
).load_data()
print(f"Loaded {len(documents)} documents")Creating the Index#
from llama_index.core import VectorStoreIndex, Settings
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
# Configure models
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
Settings.llm = OpenAI(model="gpt-4-turbo")
# Create vector index
index = VectorStoreIndex.from_documents(documents)
# Persist for later use
index.storage_context.persist(persist_dir="./storage")Query Engine as a Tool#
Transform your index into an agent tool:
from llama_index.core.tools import QueryEngineTool
# Create query engine
query_engine = index.as_query_engine(
similarity_top_k=5,
response_mode="compact"
)
# Wrap as tool
knowledge_tool = QueryEngineTool.from_defaults(
query_engine=query_engine,
name="knowledge_base",
description="Search the internal knowledge base for information about company policies and procedures."
)Creating the Agent#
Basic Agent Setup#
from llama_index.agent.openai import OpenAIAgent
agent = OpenAIAgent.from_tools(
tools=[knowledge_tool],
verbose=True,
system_prompt="""You are a helpful assistant with access to a company knowledge base.
Always search the knowledge base before answering questions about policies."""
)
# Query the agent
response = agent.chat("What is our vacation policy?")
print(response)Multi-Tool Agent#
Combine multiple data sources:
from llama_index.core.tools import FunctionTool
# Create additional tools
def get_current_date() -> str:
"""Returns the current date."""
from datetime import datetime
return datetime.now().strftime("%Y-%m-%d")
date_tool = FunctionTool.from_defaults(fn=get_current_date)
# Create agent with multiple tools
agent = OpenAIAgent.from_tools(
tools=[knowledge_tool, date_tool, calculator_tool],
verbose=True
)Advanced RAG Patterns#
Sub-Question Query Engine#
Break complex queries into simpler sub-queries:
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.tools import QueryEngineTool
# Create tools for different indices
tools = [
QueryEngineTool.from_defaults(
query_engine=hr_index.as_query_engine(),
name="hr_policies",
description="HR policies and employee handbook"
),
QueryEngineTool.from_defaults(
query_engine=tech_index.as_query_engine(),
name="technical_docs",
description="Technical documentation and APIs"
)
]
# Create sub-question engine
query_engine = SubQuestionQueryEngine.from_defaults(
query_engine_tools=tools
)Router Query Engine#
Automatically route queries to the right index:
from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector
query_engine = RouterQueryEngine(
selector=LLMSingleSelector.from_defaults(),
query_engine_tools=tools
)Production Considerations#
Vector Store Integration#
Use production-grade vector stores:
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore
# Setup Chroma
chroma_client = chromadb.PersistentClient(path="./chroma_db")
collection = chroma_client.get_or_create_collection("knowledge_base")
# Create index with Chroma
vector_store = ChromaVectorStore(chroma_collection=collection)
index = VectorStoreIndex.from_vector_store(vector_store)Caching and Performance#
from llama_index.core import Settings
# Enable caching
Settings.cache_enabled = True
# Use faster embedding model for production
Settings.embed_model = OpenAIEmbedding(
model="text-embedding-3-small",
embed_batch_size=100
)Best Practices#
- Chunking Strategy: Experiment with chunk sizes (512-1024 tokens)
- Metadata Filtering: Add metadata for better retrieval precision
- Hybrid Search: Combine vector and keyword search
- Evaluation: Use RAGAS or LlamaIndex evaluation modules
- Observability: Integrate with tracing tools like Phoenix