The Math Behind Email Classification

Snake SAT + Claude LLM — How and why it works

1. The Dana Theorem

Theorem (Dana, 2024)

Any indicator function over a finite discrete domain can be encoded as a SAT instance in polynomial time. Decision tree bucketing reduces this to linear time.

In plain terms: we don't solve SAT. We build SAT formulas directly from the data.

The NP-hardness of SAT applies to finding satisfying assignments for arbitrary formulas. Snake never does that. It constructs structured formulas where the data is the assignment. Construction is polynomial, evaluation is polynomial. The hard part (NP) is avoided entirely.

The argument

Indicator → CNF. For each non-member of the target class, there exists at least one feature where it differs from some member. A literal captures that difference. An OR-clause over such literals excludes all non-members. A conjunction (AND) of such clauses yields a CNF formula φ_C that is TRUE iff the input belongs to class C.
Polynomial construction. Each clause is built by oppose() (O(m) per literal) and minimize_clause (greedy removal). Total: O(|Fs|² × m) per class.
Linear via bucketing. Partition into IF/ELIF/ELSE buckets of size b. Total work: O(L × n × b × m). With fixed b, linear in n.

2. Snake for Email Classification

Our email classifier uses 13 boolean/categorical features extracted from each email by Claude:

Feature	Type	Signal
`has_quantity`	bool	Commande (0.92), Devis (0.55)
`has_product_ref`	bool	Commande (0.75), Relance (0.60)
`has_dimensions`	bool	Commande (0.80), Devis (0.50)
`has_price`	bool	Devis (0.85), Spam (0.40)
`has_attachment_pdf`	bool	Commande (0.55), Autre (0.45)
`has_delivery_date`	bool	Commande (0.70), Relance (0.55)
`ton`	cat	formel → Commande/Devis, informel → Spam
`has_greeting`	bool	Professional emails (0.85-0.90)
`nb_product_lines`	int	Commande (1-8), Spam (0)
`has_urgency_words`	bool	Relance (0.75), Spam (0.50)
`has_complaint_words`	bool	Autre (0.35), Relance (0.30)
`has_question_mark`	bool	Devis (0.80), Relance (0.70)
`domain_known`	bool	Pro domains (0.85), Spam (0.05)

Snake's oppose() loop builds SAT clauses that separate each class pair. For example, to separate Commande from Spam:

Clause 1: has_quantity=True OR nb_product_lines>0 OR domain_known=True
  → excludes Spam (no qty, no lines, unknown domain)
  → keeps Commande (has all three)

Clause 2: has_dimensions=True OR has_delivery_date=True
  → excludes remaining Spam that slipped through
  → keeps Commande (has both)

Each clause is a human-readable rule. The conjunction of clauses forms the full separator. This is why the audit trail is readable: it's literally the SAT formula in plain language.

3. Training Complexity

O(L × n × m × b)
= O(30 × 880 × 13 × 200)
= O(68,640,000)
≈ 68M operations
→ Trained in 3.3 seconds

Where: L=30 layers, n=880 training samples, m=13 features, b=200 bucket size.

Compare to gradient boosting on the same data: similar accuracy, but Snake produces explainable SAT rules, not a black-box ensemble. Every prediction comes with a human-readable audit trail.

4. Inference Complexity

O(L × clauses_in_bucket)
= O(30 × ~5)
= O(150) clause evaluations per prediction
→ Sub-millisecond

The Snake model evaluates a decision tree (bucket chain) and then checks clauses within the matched bucket. No matrix multiplication, no forward pass, no GPU. Pure boolean logic.

5. AUROC Breakdown

Class	AUROC	Key separating features
Commande	0.999	has_quantity + has_dimensions + has_delivery_date + nb_product_lines
Spam	0.969	domain_known=False + nb_product_lines=0 + ton=informel
Devis	0.963	has_price + has_question_mark + !has_delivery_date
Relance	0.962	has_urgency_words + has_question_mark + has_product_ref + !has_quantity
Autre	0.899	Catch-all: has_complaint_words + has_attachment_pdf + !has_quantity + !has_price

Macro AUROC: 0.958 (Wilcoxon-Mann-Whitney, 80/20 split, 1100 samples)

Autre is the weakest class at 0.899 because it's a catch-all — admin emails, invoices, complaints, misc. It has no strong positive signal, only negative ones (not a commande, not a devis, not a relance, not spam). With real labeled data, this will improve as the "autre" category becomes more defined.

6. The LLM Proxy Pattern (and why it's optional)

Raw email (infinite text space)
  ↓
Extraction: keyword/regex (default) OR Claude Haiku (opt-in)
  ↓
13 structured features (finite Boolean/categorical space)
  ↓
Snake SAT classification (<1ms, pure logic)
  ↓
Prediction + Probability + Audit trail

The key insight: Snake is the classifier, not the LLM.

The LLM (Claude Haiku) is an optional encoder that improves feature extraction quality. But Snake does all the classification work regardless. With anthropic=false (the default), keyword/regex extraction builds the 13 features directly — no API key, no network, sub-10ms. Snake classifies the same way either path.

Two extraction modes, same classifier:

Default (keyword): Regex detects quantities, dimensions, dates, product keywords, urgency markers. Fast, free, deterministic. Works well for structured text.
Haiku (opt-in, anthropic=true): Claude handles ambiguity, synonyms, implicit meaning. "Merci de nous livrer" and "veuillez expédier" both map to has_quantity=True. Costs ~$0.001 per call, adds ~2s latency.

Why Snake, not the LLM, classifies:

Explainability. The audit trail says "has_quantity=True AND has_dimensions=True AND domain_known=True ⇒ Commande." This is verifiable. An LLM saying "I think this is an order" is not.
Determinism. Same features → same prediction → same audit. Always. No temperature, no drift, no hallucination.
Cost. Snake inference is free. You only pay for Haiku when you choose to.
Speed. Snake classification: <1ms. The bottleneck was always extraction, not classification. By making extraction optional, the default path is instant.

Link to AUMA thesis

The theoretical foundation (Finite Sum Theorem, universal MAXSAT representation, polynomial construction) is detailed in Charles Dana's MSc thesis at Ecole Polytechnique, supervised by Erwan Le Pennec: sudoku.aws.monce.ai/V2/papers/