Building Ozark Ridge: Lessons Learned and What I'd Do Differently

This is the final post in the series. The first three covered what I built and how. This one covers what I learned, what I’d do differently, and why this architecture matters beyond the demo. What worked Archetype-based catalog generation scaled cleanly. Writing 1180 product descriptions by hand would have been infeasible. Generating them one-by-one with Claude would have been slow and inconsistent. The archetype system with variation logic produced realistic, diverse products at scale with no manual writing and consistent quality across the catalog. ...

April 16, 2026 · 9 min · Tyler

Building the AI Product Assistant: Context Injection, Multi-Turn Chat, and Cross-Product Retrieval

The previous posts focused on search. This one turns to the AI assistant — a floating chat widget that answers product questions, recommends complementary gear, and builds camping loadouts on request. Under the hood, it is a multi-turn conversation system with history, context injection when viewing a product, and dynamic retrieval when the query requires cross-product knowledge. What the assistant does ...

April 15, 2026 · 11 min · Tyler

Keyword Search vs Semantic Search: Why Natural Language Queries Need Vector Embeddings

The previous post covered the architecture and indexing pipeline. This one is about the core value proposition: why semantic search matters and how to demonstrate it. The approach: build both keyword and AI search, run the same queries through each, and document where keyword search fails. The results make the case for semantic search more effectively than any architectural explanation could. What keyword search actually does Postgres full-text search works by tokenizing text into lexemes (normalized words), removing stop words, and matching query tokens against indexed documents. It’s fast, deterministic, and has been reliable for decades. ...

April 14, 2026 · 10 min · Tyler

Building AI Search for a Retail Website: The Stack and Why

I built Ozark Ridge, a mock outdoor gear retail site with AI-powered product search and a Rufus-style product assistant. The project exists to demonstrate RAG (Retrieval-Augmented Generation) in a realistic e-commerce context. This is the first post in a series documenting the build. This one covers the architecture, the data and indexing pipeline, and the stack decisions behind the whole system. Later posts cover keyword vs semantic search, the AI assistant, and lessons learned. ...

April 12, 2026 · 8 min · Tyler

Scoring RAG Answer Quality with an LLM Judge

The previous post in this series built an eval harness that scores retrieval quality: does the right documentation page appear in the retrieved chunks? 7/8 passing, 88%. A useful signal. But retrieval quality and answer quality are different things. A test can pass retrieval scoring and still produce a bad answer. A test can fail retrieval scoring and still produce a correct one. Source URL retrieval is a proxy — a fast, cheap proxy that catches a lot of problems, but not all of them. ...

January 26, 2026 · 9 min · Tyler

How to Design RAG Eval Test Cases

A working RAG pipeline is easy. Knowing whether it will keep working after you change something is harder, and most projects skip that part entirely. Here, the focus is on designing an eval harness that catches real problems, using the Anthropic docs RAG agent as the example. What an eval harness does An eval harness is a script that runs a fixed set of test cases against your pipeline and produces a pass/fail score. Run it before and after a change — if the score drops, the change broke something. If it improves, the change helped. ...

January 24, 2026 · 8 min · Tyler

RAG Retrieval: Chunking, Embeddings, Reranking, and an Eval

This series covers building a RAG pipeline to answer questions about the Anthropic documentation. A RAG agent answers questions by first searching a private knowledge base, then passing the relevant excerpts to an LLM as context — the model reads the actual source material before it responds, rather than guessing from training data. Here the focus is the retrieval layer: how to chunk text, embed it, retrieve it, and measure whether retrieval is actually working. ...

January 22, 2026 · 9 min · Tyler