How to connect n8n to Supabase for vector search and RAG workflows

RAG (Retrieval-Augmented Generation) is how you build AI systems that answer questions using your actual data instead of hallucinating.The technical implementation is straightforward: store document embeddings in a vector database, search for relevant context when a question comes in, inject that context into your LLM prompt.Here's how to build this with n8n and Supabase - the most cost-effective stack for production RAG systems.

Why this stack

n8n: Open source workflow automation. Visual interface, can self-host, handles complex orchestration, free if you run it yourself.Supabase: PostgreSQL with pgvector extension. Familiar SQL database plus vector search. Way cheaper than purpose-built vector databases.OpenAI Embeddings API: Industry standard for converting text to vectors. $0.0001 per 1K tokens.GPT-4: The LLM that generates responses. Could swap for Claude or others.Combined cost for a moderate RAG system: $50-200/month depending on usage. Compare that to Pinecone or Weaviate at $500-2000/month for similar capability.

Prerequisites

You'll need an n8n instance (self-hosted or n8n Cloud), a Supabase project, an OpenAI API key, and basic understanding of embeddings and vector search.If you're new to RAG, the core concept: convert documents to numerical vectors (embeddings), store them, then find documents with vectors similar to your query vector. Those documents become context for your LLM.

Part 1: Supabase setup

Enable pgvector extension

In your Supabase project dashboard, go to Database → Extensions, search for "vector", and enable it. This adds vector data types and similarity search functions to PostgreSQL.

Create the documents table

You need a table to store your documents with their embeddings. The table should have fields for the document content, a vector field for the embedding (dimension 1536 for OpenAI's text-embedding-3-small), optional metadata as JSON, and standard timestamp fields.The vector field type matches your embedding model's output dimension. If using different embedding models, adjust accordingly.

Create the vector similarity search function

Supabase needs a database function that takes a query embedding and returns similar documents. This function uses cosine distance to find the most similar vectors, filters by a similarity threshold, and returns the top matches ordered by relevance.The function returns document ID, content, metadata, and similarity score for each match.

Add an index for performance

Without an index, vector search is slow. With it, searches are fast even with millions of vectors. Use an IVFFlat index optimized for cosine distance operations.The lists parameter should be roughly rows divided by 1000. For 100K documents, use lists equals 100. For 1M documents, use lists equals 1000.

Get your Supabase connection details

You need your Project URL (looks like https://xxxxx.supabase.co) and your Service role key from Settings → API. n8n will use the REST API, so you need the URL and service key.

Part 2: Document ingestion workflow in n8n

This workflow converts documents to embeddings and stores them in Supabase.

Workflow structure

The workflow has six steps: trigger, get documents, split into chunks if needed, generate embeddings via OpenAI API, extract the embedding data, and store in Supabase.

Node 1: Manual Trigger

Add a Manual Trigger node to start. In production, replace this with a webhook or file watcher that automatically processes new documents.

Node 2: Get document content

Use a Function node to provide sample documents, or connect to your actual content source - file uploads, APIs, existing databases, Google Drive, wherever your documents live.Each document should have content and optional metadata like source, category, author, date.

Node 3: Text chunking (optional but recommended)

If documents are long (over 500 tokens or roughly 300-400 words), split them into chunks. Use a Code node that splits text into overlapping chunks of a specified size.Chunking ensures your context stays focused. A 5000-word document might need 10-15 chunks. Each chunk gets its own embedding.

Node 4: Generate embeddings

Add an HTTP Request node that calls the OpenAI embeddings API. Use POST method, authenticate with your API key in the header, and send the document content with the model specification text-embedding-3-small.This returns a numerical vector (array of 1536 numbers) representing the semantic meaning of your text.

Node 5: Extract embedding

The API response is wrapped in metadata. Use a Code node to extract just the embedding array and combine it with your document content and metadata.

Node 6: Store in Supabase

Add an HTTP Request node that POSTs to your Supabase documents table via REST API. Send the content, embedding array, and metadata. Authenticate with your service key in the header.Run the workflow. Check Supabase and you should see documents with embeddings stored.

Part 3: Query workflow (RAG in action)

This workflow takes a user question, finds relevant documents, and generates a response using those documents as context.

Workflow structure

Seven steps: webhook trigger receives question, generate query embedding, search Supabase for similar documents, format retrieved documents as context, inject into prompt, call GPT-4, return response.

Node 1: Webhook trigger

Add a Webhook node that listens for POST requests at a path like /query. Expects JSON body with a question field.

Node 2: Generate query embedding

HTTP Request node calling OpenAI embeddings API, same as ingestion but for the user's question instead of documents. This converts "What does ThinkSwift do?" into a 1536-dimension vector.

Node 3: Extract query embedding

Code node to extract the embedding array from the API response and keep the original question for later steps.

Node 4: Search Supabase

HTTP Request node calling your match_documents database function via Supabase's RPC endpoint. Send the query embedding, a similarity threshold (0.5 is a good starting point), and how many results you want (3-5 typically).This returns the most similar documents from your database - the context that will inform the AI's response.

Node 5: Format context

Code node to combine the retrieved documents into a single context string. Join the document contents with line breaks between them.

Node 6: Call GPT-4 with context

HTTP Request node to OpenAI's chat completions API. Create a system message instructing GPT-4 to answer based on provided context, then a user message containing the context and the original question.Use low temperature (0.3) for factual, consistent responses. Higher temperature (0.7+) for more creative responses.

Node 7: Return response

Extract the generated answer from the API response and return it via the webhook. The user gets an answer grounded in your actual documents, not generic training data.

Testing the complete system

Run the ingestion workflow to populate your vector database with a few test documents. Activate the query workflow webhook. Test with a curl command or Postman sending a question as JSON.You should get a response based on your ingested documents. If the answer is generic or wrong, check that documents were actually stored and that similarity search is returning results.

Production considerations

Chunking strategy matters

Simple word-based chunking works but better approaches include sentence-boundary chunking (don't split mid-sentence), semantic chunking (split at topic boundaries), and recursive chunking (chunk large sections first, then subdivide).For most use cases, 300-500 word chunks with 50-word overlap works fine. Test with your actual documents.

Embedding model choice

text-embedding-3-small is cheap and fast. For better quality consider text-embedding-3-large (2x cost, higher accuracy), Cohere embeddings (competitive pricing, good for multilingual), or Voyage AI (specialized for RAG, excellent quality).Cost vs quality trade-off. Start with 3-small, upgrade if results aren't good enough.

Similarity threshold tuning

We used 0.5 as match threshold. This is arbitrary. Too low includes irrelevant results. Too high excludes relevant results.Test with your data. Start at 0.5, adjust based on result quality. Typical range is 0.4 to 0.7.

Number of results to retrieve

We retrieve 5 documents. This depends on how much context GPT-4 needs (it can handle roughly 8K tokens of context), how fragmented your knowledge is (more fragments equals need more results), and response quality vs cost trade-off (more context equals higher API costs).Start with 3-5, increase if answers lack depth.

Metadata filtering

The Supabase function can filter by metadata - useful for multi-tenant systems or domain-specific queries. Add WHERE clauses that check metadata fields like category, author, date range, or tenant ID.

Caching frequently asked questions

If you get the same questions repeatedly, cache the results. Check cache before doing vector search, store question plus answer pairs with TTL (time to live), dramatically reduces API costs.Simple Redis cache saves 50-80% on API costs for typical use cases.

Monitoring and logging

Log every query including question asked, documents retrieved, answer generated, and user feedback if collected. This data improves your system over time. You'll identify gaps in knowledge, popular questions, and low-quality responses.

Common issues and solutions

Vector search returns irrelevant results

Likely causes are embedding quality (try a better model), chunking strategy (your chunks are too large or poorly segmented), threshold too low (increase match_threshold), or not enough documents (need minimum 50-100 for good results).

Responses hallucinate despite RAG

The LLM is ignoring context. Strengthen system prompt ("Only answer based on provided context"), reduce temperature (0.1-0.3 for factual responses), add explicit instructions ("If context doesn't contain the answer, say I don't have that information"), or use a better model (GPT-4 better than GPT-3.5 for following instructions).

Vector search is slow

You didn't create the index. Go back to Part 1, create the ivfflat index. If you have millions of vectors, consider increasing lists parameter, using HNSW index instead of ivfflat (better for large datasets), or sharding across multiple tables.

Embeddings won't store in Supabase

Common causes include wrong dimension (check your embedding model's output dimension), embedding sent as string instead of array (make sure to parse JSON properly), or Supabase row size limit (unlikely but possible if pgvector plus content exceeds 1GB).

Cost breakdown at scale

Realistic costs for 10,000 documents and 1,000 queries per day:One-time ingestion: 10K documents times 1K tokens average equals 10M tokens. Embedding cost is $1 total.Daily query costs: 1K queries times 1K tokens equals 1M tokens. Embedding cost $0.10 per day equals $3 per month. GPT-4 cost with 500 tokens out times 1K queries equals $15 per day equals $450 per month. Supabase free tier handles this easily.Total monthly: approximately $453Compare to managed RAG solutions charging $1,500 to $3,000 per month for similar volume.

What we actually use in production

This tutorial covers the basics. Our production RAG systems add hybrid search (combine vector search with keyword search for better recall), reranking (use a cross-encoder to rerank retrieved documents before sending to LLM), query decomposition (break complex questions into sub-questions), citation tracking (return which documents were used in the answer), multi-modal RAG (handle images and PDFs, not just text), and evaluation framework (automated testing of RAG quality).But start simple. Get basic RAG working, then add complexity as needed.

Building RAG systems for production? We can help with architecture, implementation, and optimization for your specific use case.

[Talk to us about RAG development]

About ThinkSwift

We're a creative software agency in Melbourne building AI-powered operating systems and RAG-based knowledge systems for established businesses. We use Supabase for most implementations because it's cost-effective, maintainable, and doesn't lock clients into expensive vendor platforms. This tutorial reflects our actual production architecture.

How to connect n8n to Supabase for vector search and RAG workflows

How to connect n8n to Supabase for vector search and RAG workflows

Why this stack

Prerequisites

Part 1: Supabase setup

Enable pgvector extension

Create the documents table

Create the vector similarity search function

Add an index for performance

Get your Supabase connection details

Part 2: Document ingestion workflow in n8n

Workflow structure

Node 1: Manual Trigger

Node 2: Get document content

Node 3: Text chunking (optional but recommended)

Node 4: Generate embeddings

Node 5: Extract embedding

Node 6: Store in Supabase

Part 3: Query workflow (RAG in action)

Workflow structure

Node 1: Webhook trigger

Node 2: Generate query embedding

Node 3: Extract query embedding

Node 4: Search Supabase

Node 5: Format context

Node 6: Call GPT-4 with context

Node 7: Return response

Testing the complete system

Production considerations

Chunking strategy matters

Embedding model choice

Similarity threshold tuning

Number of results to retrieve

Metadata filtering

Caching frequently asked questions

Monitoring and logging

Common issues and solutions

Vector search returns irrelevant results

Responses hallucinate despite RAG

Vector search is slow

Embeddings won't store in Supabase

Cost breakdown at scale

What we actually use in production

Other posts

How to Make Faster Decisions With Data

Workflow Automation for Operations Teams (What to Automate First, What to Leave Alone, and How to Avoid the Common Traps)

How to Reduce Silos Between Departments - Without Restructuring the Entire Business