The Math Behind Email Classification

Snake SAT + Claude LLM — How and why it works

1. The Dana Theorem

Theorem (Dana, 2024)

Any indicator function over a finite discrete domain can be encoded as a SAT instance in polynomial time. Decision tree bucketing reduces this to linear time.

In plain terms: we don't solve SAT. We build SAT formulas directly from the data.

The NP-hardness of SAT applies to finding satisfying assignments for arbitrary formulas. Snake never does that. It constructs structured formulas where the data is the assignment. Construction is polynomial, evaluation is polynomial. The hard part (NP) is avoided entirely.

The argument

  1. Indicator → CNF. For each non-member of the target class, there exists at least one feature where it differs from some member. A literal captures that difference. An OR-clause over such literals excludes all non-members. A conjunction (AND) of such clauses yields a CNF formula φC that is TRUE iff the input belongs to class C.
  2. Polynomial construction. Each clause is built by oppose() (O(m) per literal) and minimize_clause (greedy removal). Total: O(|Fs|² × m) per class.
  3. Linear via bucketing. Partition into IF/ELIF/ELSE buckets of size b. Total work: O(L × n × b × m). With fixed b, linear in n.

2. Snake for Email Classification

Our email classifier uses 13 boolean/categorical features extracted from each email by Claude:

FeatureTypeSignal
has_quantityboolCommande (0.92), Devis (0.55)
has_product_refboolCommande (0.75), Relance (0.60)
has_dimensionsboolCommande (0.80), Devis (0.50)
has_priceboolDevis (0.85), Spam (0.40)
has_attachment_pdfboolCommande (0.55), Autre (0.45)
has_delivery_dateboolCommande (0.70), Relance (0.55)
toncatformel → Commande/Devis, informel → Spam
has_greetingboolProfessional emails (0.85-0.90)
nb_product_linesintCommande (1-8), Spam (0)
has_urgency_wordsboolRelance (0.75), Spam (0.50)
has_complaint_wordsboolAutre (0.35), Relance (0.30)
has_question_markboolDevis (0.80), Relance (0.70)
domain_knownboolPro domains (0.85), Spam (0.05)

Snake's oppose() loop builds SAT clauses that separate each class pair. For example, to separate Commande from Spam:

Clause 1: has_quantity=True OR nb_product_lines>0 OR domain_known=True
  → excludes Spam (no qty, no lines, unknown domain)
  → keeps Commande (has all three)

Clause 2: has_dimensions=True OR has_delivery_date=True
  → excludes remaining Spam that slipped through
  → keeps Commande (has both)

Each clause is a human-readable rule. The conjunction of clauses forms the full separator. This is why the audit trail is readable: it's literally the SAT formula in plain language.

3. Training Complexity

O(L × n × m × b)
= O(30 × 880 × 13 × 200)
= O(68,640,000)
≈ 68M operations
→ Trained in 3.3 seconds

Where: L=30 layers, n=880 training samples, m=13 features, b=200 bucket size.

Compare to gradient boosting on the same data: similar accuracy, but Snake produces explainable SAT rules, not a black-box ensemble. Every prediction comes with a human-readable audit trail.

4. Inference Complexity

O(L × clauses_in_bucket)
= O(30 × ~5)
= O(150) clause evaluations per prediction
→ Sub-millisecond

The Snake model evaluates a decision tree (bucket chain) and then checks clauses within the matched bucket. No matrix multiplication, no forward pass, no GPU. Pure boolean logic.

5. AUROC Breakdown

ClassAUROCKey separating features
Commande0.999has_quantity + has_dimensions + has_delivery_date + nb_product_lines
Spam0.969domain_known=False + nb_product_lines=0 + ton=informel
Devis0.963has_price + has_question_mark + !has_delivery_date
Relance0.962has_urgency_words + has_question_mark + has_product_ref + !has_quantity
Autre0.899Catch-all: has_complaint_words + has_attachment_pdf + !has_quantity + !has_price

Macro AUROC: 0.958 (Wilcoxon-Mann-Whitney, 80/20 split, 1100 samples)

Autre is the weakest class at 0.899 because it's a catch-all — admin emails, invoices, complaints, misc. It has no strong positive signal, only negative ones (not a commande, not a devis, not a relance, not spam). With real labeled data, this will improve as the "autre" category becomes more defined.

6. The LLM Proxy Pattern (and why it's optional)

Raw email (infinite text space)
  ↓
Extraction: keyword/regex (default) OR Claude Haiku (opt-in)
  ↓
13 structured features (finite Boolean/categorical space)
  ↓
Snake SAT classification (<1ms, pure logic)
  ↓
Prediction + Probability + Audit trail

The key insight: Snake is the classifier, not the LLM.

The LLM (Claude Haiku) is an optional encoder that improves feature extraction quality. But Snake does all the classification work regardless. With anthropic=false (the default), keyword/regex extraction builds the 13 features directly — no API key, no network, sub-10ms. Snake classifies the same way either path.

Two extraction modes, same classifier:

Why Snake, not the LLM, classifies:

Link to AUMA thesis

The theoretical foundation (Finite Sum Theorem, universal MAXSAT representation, polynomial construction) is detailed in Charles Dana's MSc thesis at Ecole Polytechnique, supervised by Erwan Le Pennec: sudoku.aws.monce.ai/V2/papers/