Ai-Engineering

Cut OpenClaw API Costs by 95%

OpenClaw is powerful, but the API costs can surprise you. A single agent running heartbeats every 30 minutes, handling a few cron jobs, and responding to messages on Telegram can easily generate $150-$600 per month in LLM API costs if you configure it carelessly. The problem is not the framework. The problem is that the default configuration most people copy from tutorials – Claude or GPT for everything, 30-minute heartbeats, bloated context files, no caching, no consideration of cheaper providers – is expensive. The good news: most of that cost is recoverable through mechanical changes that take an afternoon to implement. ...

What I Learned Building a Multi-Agent Document Analysis System

This is the retrospective for the multi-agent document analysis project. The first posts covered: why use multiple agents how the specialist agents work how the coordinator synthesizes findings This one covers what worked, what broke, and what I would change. In short, the architecture worked, the coordinator was the most valuable part, and chunking caused the worst failure mode. What worked The BaseAgent abstraction was enough. I did not need a framework. A simple base class handled the repeated LLM-call logic: model name, system prompt, max tokens, response cleaning, and JSON parsing. ...

Coordinating Multiple LLM Agents: Cross-Domain Synthesis

After building the specialist agents, the output looked impressive. It was not useful enough. The system produced: 12 technical findings 14 risk findings 10 cost findings timeline findings That is a lot of analysis. It is also a lot to read. The coordinator is the piece that turns those separate findings into something a person can act on. Aggregation is not synthesis The first version of the coordinator just ran the agents and returned their results. ...

Building Specialist LLM Agents: Technical, Risk, Cost, and Timeline Analysis

The first post covered why I split document analysis into multiple agents. This one covers how the specialists are actually built. The Python code is not the hard part. The specialist behavior mostly comes from: the system prompt the output schema the boundaries around what the agent should ignore The code is intentionally repetitive. Once you’ve written a couple agents, the pattern is familiar. The shared base class Every agent needs the same basic execution logic: ...

Why Multi-Agent Systems Beat Single Agents for Complex Documents

I built a document analysis system for RFPs and contracts using multiple specialist LLM agents instead of one general-purpose prompt. The architecture is simple: PDF → text extraction → Technical Analyzer → Risk Analyzer → Cost Analyzer → Timeline Analyzer → Coordinator synthesis → final report The interesting part is not that it calls an LLM. That’s easy. The interesting part is how much the output changes when the model is forced to analyze the same document through different lenses before producing a final answer. ...

Building Ozark Ridge: Lessons Learned and What I'd Do Differently

This is the final post in the series. The first three covered what I built and how. This one covers what I learned, what I’d do differently, and why this architecture matters beyond the demo. What worked Archetype-based catalog generation scaled cleanly. Writing 1180 product descriptions by hand would have been infeasible. Generating them one-by-one with Claude would have been slow and inconsistent. The archetype system with variation logic produced realistic, diverse products at scale with no manual writing and consistent quality across the catalog. ...

Building the AI Product Assistant: Context Injection, Multi-Turn Chat, and Cross-Product Retrieval

The previous posts focused on search. This one turns to the AI assistant — a floating chat widget that answers product questions, recommends complementary gear, and builds camping loadouts on request. Under the hood, it is a multi-turn conversation system with history, context injection when viewing a product, and dynamic retrieval when the query requires cross-product knowledge. What the assistant does ...

Keyword Search vs Semantic Search: Why Natural Language Queries Need Vector Embeddings

The previous post covered the architecture and indexing pipeline. This one is about the core value proposition: why semantic search matters and how to demonstrate it. The approach: build both keyword and AI search, run the same queries through each, and document where keyword search fails. The results make the case for semantic search more effectively than any architectural explanation could. What keyword search actually does Postgres full-text search works by tokenizing text into lexemes (normalized words), removing stop words, and matching query tokens against indexed documents. It’s fast, deterministic, and has been reliable for decades. ...

Building AI Search for a Retail Website: The Stack and Why

I built Ozark Ridge, a mock outdoor gear retail site with AI-powered product search and a Rufus-style product assistant. The project exists to demonstrate RAG (Retrieval-Augmented Generation) in a realistic e-commerce context. This is the first post in a series documenting the build. This one covers the architecture, the data and indexing pipeline, and the stack decisions behind the whole system. Later posts cover keyword vs semantic search, the AI assistant, and lessons learned. ...

AI-Powered QA Testing with playwright-cli and GitHub Copilot

Most AI-assisted QA workflows assume you have access to everything: Playwright MCP configured in VS Code, Copilot Vision enabled, the embedded browser panel working. In an enterprise environment, those assumptions often don’t hold. Security policies restrict which tools can connect to which services. Features get disabled. The standard setup isn’t available. This post documents a different approach that factors in certain constraints. The combination: playwright-cli for browser interaction, GitHub Copilot CLI for the agent loop, and a plain natural language prompt describing what to test. No MCP. No generated test files. No vision model. Just a coding agent running shell commands against a real browser. ...