How we built an AI digital receptionist in 6 weeks: Project Penny

Most AI projects take 4-6 months and cost $100K+. We built Penny - a production AI receptionist for professional services firms - in 6 weeks for under $20K.

Here's exactly how we did it, what worked, what didn't, and what we learned building our first productized AI service.

The problem we kept seeing

We'd built custom automation for law firms, accounting practices, and financial planners. Different clients, same pattern:

The conversation would go:

"We lose so many leads that come through the website after hours."

"How many?"

"No idea, but probably a lot. People researching lawyers at 9pm aren't going to wait until tomorrow for a response."

"What about reception handling it during business hours?"

"They're overwhelmed. Half their time is answering the same questions - 'Do you handle X type of matter?' 'What does it cost?' 'When can I meet with someone?'"

"Have you tried a chatbot?"

"Yeah, it was useless. Nobody used it and when they did it couldn't understand basic questions."

This problem appeared in probably 6 different client conversations over 3 months. Professional services firms losing qualified leads to slow response times and reception overwhelm.

We could build custom solutions for each client. Or we could build one system that works for all of them.

That's how Penny started.

Week 1: Validating the concept

Before writing any code, we needed to know if this would actually work.

Day 1-2: Prototype with existing tools

We threw together a basic version using OpenAI's Assistants API, a simple web interface, and hardcoded knowledge about a fictional law firm.

Tested it ourselves asking typical prospect questions:

"Do you handle employment law?"
"How much does a divorce cost?"
"Can I talk to someone today?"

Result: It worked. Not perfectly, but well enough to prove the concept.

Day 3-5: Real user testing

Found 3 lawyers we know and asked them to try breaking it. Gave them access and watched what happened.

What worked:

Understood questions phrased dozens of different ways
Stayed on-brand and professional
Handled follow-up questions with context

What broke:

Made up pricing when it should have said "varies based on complexity"
Couldn't handle "I need help but don't know what type of lawyer I need"
Calendar integration was manual and clunky

Decision: This is viable. Build it properly.

Week 2-3: Architecture and tech stack

We had to make some key technical decisions early.

Decision 1: Build vs integrate

Option A: Use an off-the-shelf chatbot platform and customize it
Option B: Build custom from scratch

We went with custom build because:

Needed deep integration with professional services workflows
Wanted full control over the conversation logic
Existing platforms charged per-conversation (expensive at scale)
Could build exactly what we needed, nothing more

Decision 2: Which LLM?

Tested GPT-4, Claude (Anthropic), and GPT-3.5.

Winner: GPT-4 (now GPT-4-turbo)

Why:

Best at following instructions ("Don't make up pricing")
More reliable with structured outputs (extracting lead qualification data)
Better at maintaining professional tone

Cost trade-off: More expensive per conversation, but worth it for quality.

Decision 3: Tech stack

Frontend: React + Tailwind CSS
Clean, fast, works on mobile. Nothing fancy.

Backend: Node.js + Express
Handles API calls, manages conversation state, integrates with external services.

Database: Supabase (PostgreSQL)
Stores conversation history, lead data, configuration. Simple, reliable, cheap.

AI Layer: OpenAI API (GPT-4)
The brain. Handles natural language understanding and response generation.

Integrations: Google Calendar API, HubSpot API, n8n for workflow orchestration
Connects to real business systems.

Hosting: Vercel (frontend), Railway (backend)
Deploy-and-forget infrastructure.

Total stack cost: ~$100/month at moderate volume.

Week 3-4: Core build

This is where most of the actual coding happened.

The conversation engine

Built a state machine managing conversation flow:

State 1: Greeting
Introduces Penny, asks how it can help

State 2: Understanding
Parses the question, determines intent (service inquiry, pricing question, booking request, general info)

State 3: Knowledge retrieval
Searches the firm's knowledge base for relevant information

State 4: Response generation
GPT-4 generates answer based on retrieved knowledge + conversation history

State 5: Qualification
Asks follow-up questions to qualify the lead (practice area, urgency, budget awareness)

State 6: Action
Books appointment OR captures contact info for follow-up OR escalates to human

The key insight: Don't let the AI free-form everything. Guide it through states with specific jobs at each stage.

The knowledge base system

This took longer than expected.

Naive approach we tried first: Dump all the firm's information into context and let GPT-4 figure it out.

Problem: Inconsistent responses, made things up when confused, couldn't handle nuance.

Better approach: Structured knowledge base with RAG (Retrieval-Augmented Generation)

How it works:

Firm provides information in structured format (services, pricing, FAQs, policies)
We convert it to searchable embeddings
When question comes in, we retrieve relevant knowledge chunks
Feed ONLY relevant chunks to GPT-4 with the question
GPT-4 generates response based on provided knowledge, not training data

Result: Much more accurate, no hallucinations, consistent answers.

Calendar integration

This was harder than it should have been.

Requirement: Show lawyer's actual availability and book appointments directly.

Challenge: Google Calendar API is finicky. Timezone handling is a nightmare. Availability rules are complex (block mornings, no Fridays, minimum 24-hour notice, etc.)

Solution: Built our own availability logic layer on top of Calendar API.

Took 3 days to get right. Worth it - booking now works reliably.

Week 5: Integration and testing

Built Penny. Now make it work with real firms' systems.

CRM integration (HubSpot)

When Penny qualifies a lead, it needs to create a contact record with full conversation context.

Built: HubSpot API integration that creates contacts, adds custom properties (lead source, qualification status, matter type), and logs conversation transcript.

Learned: Different firms use HubSpot differently. Built flexibility into field mapping.

Email/SMS notifications

When someone books an appointment, people need to know.

Built: Automated emails/SMS for:

Appointment confirmation to prospect
Notification to assigned lawyer
Reminder 24 hours before meeting
Follow-up if meeting is missed

Used Twilio for SMS, SendGrid for email. Took 2 days.

Real firm testing

Found 2 friendly law firms willing to test on their actual websites.

Week 5 Day 1-3: Deploy to their sites, monitor every conversation

Issues discovered:

People ask about practice areas in weird ways ("Do you do DUIs?" vs "Do you handle drink driving?")
Pricing questions more varied than expected
Some people just want a phone number (added that prominently)
Mobile formatting needed work

Fixes: Improved knowledge base structure, added common question variations, refined mobile UI.

Week 6: Polish and launch

Final week focused on making it production-ready.

Admin dashboard

Firms needed to see what Penny was doing.

Built:

Conversation history with search
Lead qualification status
Booking calendar view
Common questions analytics
Knowledge base editor

Tech: React admin panel hitting same backend APIs.

Took 3 days. Rushed but functional.

Documentation

Wrote actual documentation for:

How to configure Penny for your practice
How to update the knowledge base
How to interpret conversation analytics
Troubleshooting common issues

This matters. Nobody uses software they don't understand.

Pricing model

Had to figure out what to charge.

Cost to run per firm: ~$50-150/month (hosting + API costs)

Value delivered: $1,500-3,000/month in reception time + improved conversion

Decided: $497/month for professional services firms

Covers our costs, scales with volume, feels reasonable for the value.

What worked

Starting with a prototype: Proved viability before committing to full build. Saved us from building something nobody wanted.

Keeping scope tight: Just receptionist functionality. No CRM, no practice management, no everything-platform. Do one thing well.

Using proven tools: Didn't build our own LLM, didn't write custom auth, didn't reinvent infrastructure. Used OpenAI, Supabase, Vercel.

Testing with real users early: Week 1 feedback shaped the entire product. Would have built wrong things otherwise.

Structured knowledge base: RAG approach eliminated hallucinations. Critical for professional services where accuracy matters.

What didn't work (and what we learned)

Trying to handle phone calls: Initial scope included voice. Abandoned after 3 days - too complex for timeline. Website chat only for v1.

Over-engineering conversation flow: First version had 12 different states with complex transition logic. Simplified to 6 states. Simpler works better.

Assuming firms would provide clean knowledge: They don't. Most firms don't have their FAQs written down. We had to help extract it through interviews. Budget for this.

Underestimating calendar integration: Thought it'd take 1 day. Took 3. Timezone handling is genuinely hard.

Not building admin dashboard sooner: Added it in week 6. Should have built it week 4. Firms need visibility from day one.

The actual outcomes

After 6 weeks and ~$18K in development cost (mostly our time), we launched Penny with 2 pilot firms.

Firm 1: Family law practice (3 lawyers, Melbourne)

Before Penny:

40-60 website inquiries/month
~50% came outside business hours
Conversion from inquiry to booked consultation: ~35%
Reception spent 12-15 hours/week fielding inquiries

After Penny (first 60 days):

Same inquiry volume
100% get instant response (24/7)
Conversion to booked consultation: 52%
Reception time on inquiries: 3-4 hours/week (just complex escalations)

Impact: 5 additional consultations booked monthly, 10 hours/week reception time freed up.

Their words: "It's like having someone competent working weekends and evenings for $500/month. Absolute no-brainer."

Firm 2: Commercial law firm (8 lawyers, Sydney)

Before Penny:

80-120 website inquiries/month
Mix of qualified corporate clients and random consumer inquiries they don't handle
Reception overwhelmed qualifying leads
Lawyers got unqualified leads regularly

After Penny:

Same inquiry volume
Penny filters out ~40% that aren't appropriate (consumer matters, out of scope)
Qualified leads get routed to right practice area
Lawyers only see pre-qualified, contextualized leads

Impact: Better lead quality, less lawyer time wasted on bad fits, reception can focus on client service.

Their words: "Penny's qualification logic is better than our previous reception process. And it never takes sick days."

What we'd do differently

Start with better knowledge base tooling: We built the knowledge editor last. Should have been first. Firms need easy ways to update what Penny knows.

Clearer qualification criteria upfront: Each firm qualifies leads differently. We should have had a standardized framework from day one.

More attention to mobile: 60% of inquiries come from mobile. We optimized desktop first. Backwards.

Build analytics earlier: Firms want to know: what questions are people asking? What's converting? We added this late.

Invest in onboarding: First two firms required significant hand-holding. Need better self-serve onboarding.

The technical lessons

LLMs are non-deterministic: Same question doesn't always get identical response. Design around this - use structured outputs, validate responses, have fallbacks.

Context window management matters: Early version kept entire conversation in context. Got expensive fast. Now we summarize older messages.

Error handling is critical: API calls fail. Calendars have conflicts. CRMs go down. Every integration point needs graceful failure.

Prompt engineering is real work: Took 20+ iterations to get GPT-4 responding how we wanted. "Make it professional" doesn't work. Specific examples do.

Testing is different with AI: Can't test every possible input. Focus on common paths, edge cases, and failure modes.

Why 6 weeks was possible

We're often asked how we built this so fast when most AI projects take months.

Scope discipline: Just receptionist functionality. No feature creep.

Existing infrastructure knowledge: We'd built similar integrations before. Knew the patterns.

Using the right tools: OpenAI API for AI, proven frameworks for everything else. Didn't reinvent wheels.

Two technical co-founders: Both could code, both could design. No handoff delays.

Direct customer access: Testing with real firms immediately. No layers of approval.

Willingness to cut corners: Admin dashboard is functional but ugly. Good enough for v1.

No enterprise requirements: Not building for compliance-heavy industries (yet). Kept it simple.

Six weeks is short. We worked hard. But it's achievable when you scope tightly and use modern tools.

What's next for Penny

Version 1 is live and working. Here's the roadmap:

Near term (next 2 months):

Improved knowledge base editor (make it easier for firms to update)
More CRM integrations (Clio, Smokeball, other legal CRMs)
Better analytics dashboard
SMS conversation option (some prospects prefer text)

Medium term (3-6 months):

Multi-language support (lots of Australian firms serve non-English clients)
Voice capability (handle phone inquiries, not just website chat)
AI-powered follow-up sequences (nurture leads that don't book immediately)
Practice area specialization (family law Penny vs commercial law Penny)

Long term (6-12 months):

Expand beyond legal (accounting, financial planning versions)
White-label option (agencies can offer Penny under their brand)
Integration marketplace (plug into more practice management systems)

But we're not rushing. Better to have one thing that works really well than ten half-baked features.

The honest assessment

Penny isn't perfect. It makes mistakes. It occasionally misunderstands questions. It can't handle genuinely novel situations.

But it works. Firms are paying for it. Leads are converting. Reception teams are happier.

More importantly: we proved we can build and ship AI products, not just consult on AI strategy.

That was the real goal. Penny is our first productized service. It won't be our last.

Want to see Penny in action on your website? We can set up a demo with your actual practice information.

[Book a Penny demo]

Building an AI product and want to know how we approached it? Happy to share more detail on the technical architecture.

[Talk to us about AI development]

About ThinkSwift

We're a creative software agency in Melbourne that builds AI-powered business systems. Penny is our first productized service - an AI receptionist specifically for professional services firms. We built it because we kept seeing the same problem across law, accounting, and financial planning practices. Now we're using what we learned to build more AI products that solve real operational problems.