BERT CPU Benchmark

CPU inference profiler for any BERT-family encoder model on Hugging Face

Measures parameters, MACs, FLOPs, mean latency, and p95 latency — all on CPU, no GPU required.

Compatible with any AutoModel-loadable encoder model: BERT, RoBERTa, DeBERTa, ELECTRA, DistilBERT, and custom distilled models.

10 500
64 512

📐 Model Complexity

⏱️ CPU Latency


FLOPs are hardware-agnostic — they measure the model's computational cost, not the machine's speed. Latency is measured with torch.inference_mode() after 20 warm-up passes to avoid cold-start bias.

Try an example