BERT CPU Benchmark

CPU inference profiler for any BERT-family encoder model on Hugging Face

Measures parameters, MACs, FLOPs, mean latency, and p95 latency — all on CPU, no GPU required.

Compatible with any AutoModel-loadable encoder model: BERT, RoBERTa, DeBERTa, ELECTRA, DistilBERT, and custom distilled models.

Model ID

Tokenizer ID

trust_remote_code (required for custom architectures)

Sample text for latency benchmark

Latency runs

10 500

Sequence length

64 512

📐 Model Complexity

Parameters

Model size

MACs

FLOPs (2 × MACs)

⏱️ CPU Latency

Mean latency

p95 latency

FLOPs are hardware-agnostic — they measure the model's computational cost, not the machine's speed. Latency is measured with torch.inference_mode() after 20 warm-up passes to avoid cold-start bias.

Try an example