BERT CPU Benchmark
CPU inference profiler for any BERT-family encoder model on Hugging Face
Measures parameters, MACs, FLOPs, mean latency, and p95 latency — all on CPU, no GPU required.
Compatible with any
AutoModel-loadable encoder model: BERT, RoBERTa, DeBERTa, ELECTRA, DistilBERT, and custom distilled models.
10 500
64 512
📐 Model Complexity
⏱️ CPU Latency
FLOPs are hardware-agnostic — they measure the model's computational cost, not the machine's speed. Latency is measured with
torch.inference_mode()after 20 warm-up passes to avoid cold-start bias.
Try an example