What is RAG and Why Use It?
Ever wished your documentation could answer questions like a human expert? That's exactly what RAG (Retrieval Augmented Generation) does. Instead of relying on outdated training data, it reads your actual documentation to give accurate, up-to-date answers.
Think of it like having a smart assistant who reads your docs before answering questions. The best part? With OpenAI's latest models, you can feed in entire documentation sets at once - no complex database needed!
Table of Contents
- Setting Up Your Project
- Building the Search System
- Creating the API
- Testing Your System
- Best Use Cases
Setting Up Your Project
Let's build something cool! We'll create a system that automatically picks the right documentation file and answers questions about it.
- Create a new project:
mkdir my-rag-projectcd my-rag-projectnpm init -y
- Install the needed packages:
npm install openai dotenv
- Create a
.env
file:
OPENAI_API_KEY=your-api-key-here
Building the Search System
Our search system does two clever things:
- Uses AI to pick the most relevant documentation file
- Reads that file and answers questions about it
Here's a simple example that answers questions about your documentation. Since GPT-4o mini has a large 128K token context window, we can include more context than previous models, but we should still be mindful of very large files:
import { OpenAI } from 'openai';import fs from 'fs/promises';import dotenv from 'dotenv';
// Load environment variablesdotenv.config();
// Initialize OpenAIconst openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY});
async function answerQuestion(question) { try { // 1. Read the documentation file const docContent = await fs.readFile('./docs/documentation.md', 'utf-8');
// Warning: For very large files, you might want to split them // GPT-4o mini supports 128K tokens, but that's roughly 400K characters if (docContent.length > 400_000) { console.warn('Document might be too large for context window'); }
// 2. Ask AI with context const response = await openai.chat.completions.create({ model: "gpt-4o-mini", messages: [ { role: "system", content: "You are a helpful assistant that answers questions based on the provided documentation." }, { role: "user", content: ` Documentation: ${docContent}
Question: ${question}
Please answer the question using information from the documentation. If the answer isn't in the documentation, say so politely. ` } ], // You can adjust max tokens based on your needs max_tokens: 16000 // GPT-4o mini supports up to 16K output tokens });
return response.choices[0].message.content; } catch (error) { console.error('Error:', error); return "Sorry, I couldn't process your question right now."; }}
// Example usageconst answer = await answerQuestion("How do I handle errors?");console.log(answer);
Creating the API
Now let's wrap our search system in a simple API that anyone can use:
Let's build a simple API to test our RAG system. First, create your project structure:
mkdir my-rag-projectcd my-rag-projectnpm init -y
Install the required dependencies:
npm install openai dotenv express
Create a .env
file in your project root:
OPENAI_API_KEY=your-api-key-here
Create these folders and files:
mkdir src docstouch src/ragService.js src/server.js
Add some sample documentation files in the docs
folder:
# Windowsecho "# API Reference" > docs/api-reference.mdecho "# Getting Started" > docs/getting-started.mdecho "# Troubleshooting" > docs/troubleshooting.md
# Mac/Linuxtouch docs/api-reference.md docs/getting-started.md docs/troubleshooting.md
Now create the RAG service:
import { OpenAI } from 'openai';import fs from 'fs/promises';import dotenv from 'dotenv';
dotenv.config();
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY});
async function selectRelevantFile(question) { const files = await fs.readdir('./docs'); const fileList = files .filter(f => f.endsWith('.md') || f.endsWith('.txt')) .map(f => ({ filename: f }));
const response = await openai.chat.completions.create({ model: "gpt-4o-mini", messages: [ { role: "system", content: "You are a helpful assistant that selects the most relevant documentation file based on a user's question. Respond in JSON format with 'filename' and 'reason' fields." }, { role: "user", content: ` Available files: ${JSON.stringify(fileList)} User question: "${question}" Select the most relevant file and explain why. Respond in JSON format. ` } ], response_format: { type: "json_object" } });
return JSON.parse(response.choices[0].message.content);}
export async function smartRAG(question) { try { // 1. Select the relevant file const fileSelection = await selectRelevantFile(question); console.log(`Selected ${fileSelection.filename} because: ${fileSelection.reason}`);
// 2. Read the selected file const docContent = await fs.readFile(`./docs/${fileSelection.filename}`, 'utf-8');
// 3. Get AI response with context const response = await openai.chat.completions.create({ model: "gpt-4o-mini", messages: [ { role: "system", content: "You are a helpful assistant that answers questions based on the provided documentation." }, { role: "user", content: ` Documentation from ${fileSelection.filename}: ${docContent}
Question: ${question}
Please answer based on this documentation. If the answer isn't in the documentation, say so politely. ` } ], });
return { fileSelection, answer: response.choices[0].message.content }; } catch (error) { console.error('Error:', error); throw error; }}
Create your Express server:
import express from 'express';import { smartRAG } from './ragService.js';
const app = express();app.use(express.json());
// Setup your docs folder structure:// docs/// - api-reference.md// - getting-started.md// - troubleshooting.md
app.post('/ask', async (req, res) => { const { question } = req.body; try { const result = await smartRAG(question); res.json(result); } catch (error) { res.status(500).json({ error: "Couldn't process your question" }); }});
app.listen(3000, () => { console.log('RAG API running on port 3000');});
Update your package.json:
{ "name": "my-rag-project", "version": "1.0.0", "type": "module", "scripts": { "start": "node src/server.js", "dev": "node --watch src/server.js" }, "dependencies": { "dotenv": "^16.4.5", "express": "^4.21.1", "openai": "^4.72.0" }}
Your project structure should look like this:
my-rag-project/├── .env├── package.json├── src/│ ├── ragService.js│ └── server.js└── docs/ ├── api-reference.md ├── getting-started.md └── troubleshooting.md
Start the server:
npm run dev
Testing Your System
The moment of truth! Let's see our system in action:
Testing with Postman
- Open Postman
- Create a new POST request to
http://localhost:3000/ask
- Set the header:
Content-Type: application/json
- Set the body (raw/JSON):
{ "question": "How do I handle errors?"}
You'll get a response like:
{ "fileSelection": { "filename": "troubleshooting.md", "reason": "The question is about error handling, which would be covered in the troubleshooting documentation" }, "answer": "Based on the documentation..."}
Best Use Cases
You might be surprised how powerful this simple approach can be. Here's where it really shines:
Documentation That Changes Often
Perfect for API docs, user guides, and technical specs that need frequent updates. Just edit your markdown files and the system instantly uses the new content - no rebuilding or reindexing needed.
Internal Tools and Support
Ideal for customer support teams and internal documentation. Your team gets instant, accurate answers based on your actual documentation, not outdated training data.
Quick Prototypes and MVPs
Need to prove AI-powered search can work for your docs? This approach gets you there in hours, not weeks. No complex infrastructure required.
The best part? With modern AI models supporting massive context windows (about 400K characters), you can feed in entire documentation sets at once. This means you can handle surprisingly large documentation bases without any extra complexity.
Remember: Sometimes the simplest solution is the best one. Don't jump to complex vector databases until you've outgrown this approach!