Build a RAG System with Node.js and OpenAI - No Database Required

Authors
  • avatar
    Name
    Hamza Rahman
Published on
-
6 mins read
-
rag

What is RAG and Why Use It?

RAG (Retrieval Augmented Generation) lets a model answer questions from your actual documentation instead of its training data. You hand it the relevant docs, it reads them, and it answers based on what's in front of it.

This post builds the simplest possible version: no vector database, no embeddings. With a modern small model like GPT-5.4 mini you can pass a whole doc straight into the prompt and let the model answer. That covers a surprising number of real cases, and you can add a vector store later, once your docs outgrow the context window.

Setting Up the Project

Create the project and install what we need:

mkdir my-rag-project
cd my-rag-project
npm init -y
npm install openai dotenv express

Put your OpenAI key in a .env file:

OPENAI_API_KEY=your-api-key-here

Set up package.json for ES modules so the imports below work:

package.json
{
"name": "my-rag-project",
"version": "1.0.0",
"type": "module",
"scripts": {
"start": "node src/server.js",
"dev": "node --watch src/server.js"
},
"dependencies": {
"dotenv": "^16.4.5",
"express": "^5.0.0",
"openai": "^5.0.0"
}
}

Now create the folders and a few sample docs to query. Use printf so the files actually have content on macOS, Linux, and Git Bash:

mkdir src docs
printf '# Troubleshooting\n\nIf the API returns a 500, check your API key and retry with backoff.\n' > docs/troubleshooting.md
printf '# Getting Started\n\nInstall the SDK and set OPENAI_API_KEY before running the server.\n' > docs/getting-started.md
printf '# API Reference\n\nPOST /ask with a JSON body: { "question": "..." }.\n' > docs/api-reference.md

A First Pass: Answer From One File

Start with the simplest version: read one file, ask the model, return the answer.

import { OpenAI } from 'openai'
import fs from 'fs/promises'
import dotenv from 'dotenv'
dotenv.config()
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY })
async function answerQuestion(question) {
const docContent = await fs.readFile('./docs/troubleshooting.md', 'utf-8')
// Large files can exceed the context window. Check your model's limit and split if needed.
const response = await openai.chat.completions.create({
model: 'gpt-5.4-mini',
messages: [
{
role: 'system',
content: 'You answer questions using only the provided documentation.',
},
{
role: 'user',
content: `Documentation:\n${docContent}\n\nQuestion: ${question}\n\nAnswer using the documentation. If the answer is not there, say so.`,
},
],
max_completion_tokens: 4096,
})
return response.choices[0].message.content
}
const answer = await answerQuestion('How do I handle a 500 error?')
console.log(answer)

Two things changed from the older gpt-4o-mini examples you might have seen. The model is now gpt-5.4-mini (swap in gpt-5.4-nano if you want it cheaper and faster for simple lookups), and the output limit is max_completion_tokens, since max_tokens is deprecated. GPT-5.4 mini has a large context window, so small and medium docs fit in one prompt. For exact limits, check the models page.

Picking the Right File Automatically

One file is fine for a demo. Real docs are split across many. So let the model pick the most relevant file first, then answer from it.

src/ragService.js
import { OpenAI } from 'openai'
import fs from 'fs/promises'
import dotenv from 'dotenv'
dotenv.config()
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY })
async function selectRelevantFile(question) {
const files = await fs.readdir('./docs')
const fileList = files
.filter(f => f.endsWith('.md') || f.endsWith('.txt'))
.map(f => ({ filename: f }))
const response = await openai.chat.completions.create({
model: 'gpt-5.4-mini',
messages: [
{
role: 'system',
content:
"You select the most relevant documentation file for a question. Respond in JSON with 'filename' and 'reason' fields.",
},
{
role: 'user',
content: `Available files: ${JSON.stringify(fileList)}\n\nQuestion: "${question}"\n\nPick the most relevant file and explain why, in JSON.`,
},
],
response_format: { type: 'json_object' },
})
return JSON.parse(response.choices[0].message.content)
}
export async function smartRAG(question) {
// 1. Pick the file
const fileSelection = await selectRelevantFile(question)
console.log(`Selected ${fileSelection.filename}: ${fileSelection.reason}`)
// 2. Read it
const docContent = await fs.readFile(`./docs/${fileSelection.filename}`, 'utf-8')
// 3. Answer from it
const response = await openai.chat.completions.create({
model: 'gpt-5.4-mini',
messages: [
{
role: 'system',
content: 'You answer questions using only the provided documentation.',
},
{
role: 'user',
content: `Documentation from ${fileSelection.filename}:\n${docContent}\n\nQuestion: ${question}\n\nAnswer using this documentation. If the answer is not there, say so.`,
},
],
max_completion_tokens: 4096,
})
return { fileSelection, answer: response.choices[0].message.content }
}

Wrapping It in an API

Put smartRAG behind a small Express endpoint so anything can call it.

src/server.js
import express from 'express'
import { smartRAG } from './ragService.js'
const app = express()
app.use(express.json())
app.post('/ask', async (req, res) => {
const { question } = req.body
try {
const result = await smartRAG(question)
res.json(result)
} catch (error) {
console.error('Error:', error)
res.status(500).json({ error: "Couldn't process your question" })
}
})
app.listen(3000, () => {
console.log('RAG API running on port 3000')
})

Your project should look like this:

my-rag-project/
├── .env
├── package.json
├── src/
│ ├── ragService.js
│ └── server.js
└── docs/
├── api-reference.md
├── getting-started.md
└── troubleshooting.md

Start it:

npm run dev

Testing It

Send a POST request to http://localhost:3000/ask with a JSON body. With curl:

curl -X POST http://localhost:3000/ask \
-H "Content-Type: application/json" \
-d '{"question": "How do I handle a 500 error?"}'

You get back the file the model chose and its answer:

{
"fileSelection": {
"filename": "troubleshooting.md",
"reason": "The question is about error handling, which the troubleshooting doc covers"
},
"answer": "Check your API key and retry with backoff..."
}

Postman works the same way: a POST to the same URL, Content-Type: application/json, and the JSON body above.

Where This Works (and When to Add a Vector DB)

This whole-file approach goes further than you'd expect:

  • Docs that change often. Edit a markdown file and the next question uses the new content. Nothing to rebuild or reindex.
  • Internal tools and support. Your team gets answers from your real docs, not the model's training data.
  • Prototypes. You can prove the idea in an afternoon without standing up any infrastructure.

It has a ceiling, though. Once your docs are too big to fit in the context window, or you need the model to pull a few relevant passages out of thousands of pages, that's when embeddings and a vector database start to earn their keep. Until then, plain files are usually enough.

Where to go next

When you outgrow plain files, the next step depends on your stack: