Encoding Expert Knowledge: What I Learned Extracting 279 Rules

The Knowledge That Lives in One Person's Head

Every business has someone who just knows how things work. They know that cases from County X always take longer because the clerk's office is slow. They know that when a client says "minor accident" but the police report mentions an ambulance, you treat it as serious. They know the fourteen exceptions to the standard process and which ones actually matter.

This knowledge is the most valuable asset in the operation, and it lives nowhere except that person's memory. When they're on vacation, things slip. When they leave, it disappears.

I built a system to extract it.

Critical Decision Method

The methodology matters more than the technology here. I used an adapted version of the Critical Decision Method — a technique from cognitive psychology for eliciting expert knowledge.

The key insight: don't ask experts for abstract rules. Ask them about specific decisions they actually made. "Tell me about a time you overrode the standard process" produces far richer data than "What are the exceptions to the standard process?"

People reconstruct their reasoning much better when anchored to real events. They remember the context, the pressure, the tradeoffs. Abstract questions get you platitudes. Specific questions get you operational truth.

The AI Interviewer

I built an AI interviewer that runs structured sessions with the expert. Each session focuses on a specific domain area — case intake, document processing, deadline management, client communication.

The interviewer follows the CDM structure: start with a specific incident, walk through the timeline, probe decision points, extract the reasoning. But it's not rigid. The AI adapts based on responses, follows threads that seem productive, and circles back to areas where the expert was vague.

As the expert talks, the system extracts rules in real-time. Each rule gets structured as:

Trigger: When does this rule apply?
Action: What do you do?
Reasoning: Why this approach?
Exceptions: When doesn't this rule apply?
Thresholds: Any specific numbers, deadlines, or cutoffs?

This structure forces precision. An expert might say "rush anything that looks urgent." The system pushes back: What makes something look urgent? Is there a dollar amount? A case type? A client tier? The structured format turns vague heuristics into actionable logic.

The Deduplication Problem

After a few sessions, I noticed something predictable: the same expert describes the same rule in different ways across different sessions. Session 2 yields "always check the statute of limitations first." Session 5 yields "before you do anything with a new case, verify the filing deadline hasn't passed." Same rule, different words.

Without deduplication, you end up with a knowledge base full of near-duplicates that confuse anyone trying to use it. I implemented Jaccard similarity matching on the structured rule components — comparing triggers, actions, and reasoning fields. When a new rule matches an existing one above a similarity threshold, the system flags it as a potential duplicate for review rather than silently adding it.

Over 279 extracted rules, this caught 31 duplicates that would have polluted the knowledge base.

The Exception Detection Surprise

Here's something I didn't expect: 28 of the 279 rules had data in the wrong fields. An exception was written as a trigger. A threshold was stuffed into the reasoning field. A specific action was described in the exceptions section.

The system caught all 28 automatically. The check is straightforward — each field has an expected data type and pattern. Triggers should describe conditions ("when X happens"). Actions should describe behaviors ("do Y"). When a trigger reads like an action ("always call the client"), that's a field mismatch.

This kind of structural validation is something humans are terrible at catching in bulk. After reading 200+ rules, your eyes glaze over. The system never gets tired of checking.

What 279 Rules Actually Looks Like

They're not uniform. Some rules are one line: "If the client is unresponsive for 14 days, send the standard follow-up letter." Others have six exceptions and apply only to specific case types in specific jurisdictions.

The structure has to accommodate this variance. I built it so that every field is optional except trigger and action. Some rules have no exceptions. Some have no thresholds. That's fine — the absence of exceptions is itself meaningful information.

The rules break down across 8 operational categories, with wildly different densities. Case intake has 61 rules. Client communication has 23. The distribution tells you something about where the complexity actually lives in the operation.

What I Learned

AI is better at eliciting rules than humans are. That's a provocative claim, so let me be specific about what I mean.

A human interviewer gets tired. They develop rapport with the expert and stop pushing on uncomfortable topics. They unconsciously lead the witness toward answers that match their own mental model. They skip edge cases because they feel awkward asking "but what if the client is lying?"

The AI interviewer doesn't have these problems. It asks the uncomfortable follow-up every time. It doesn't have a mental model to confirm. It treats the fifteenth session with the same rigor as the first.

The expert noticed this too. About four sessions in, they told me: "This thing asks better questions than my last three managers."

That's not because the AI is smarter. It's because it's relentless in the specific way that good knowledge extraction requires — patient, systematic, and immune to social pressure.