Try it Live
Run BLS12-381 examples in the interactive playground
Performance
Benchmarks and optimization strategies for BLS12-381 operations.Native Benchmarks (BLST)
Measured on Apple M1 Pro (ARM64) and Intel i9-12900K (x86_64):G1 Operations
| Operation | M1 Pro | i9-12900K | Notes |
|---|---|---|---|
| G1 Add | 12 μs | 15 μs | Point addition |
| G1 Double | 8 μs | 10 μs | Point doubling |
| G1 Mul | 65 μs | 80 μs | Scalar multiplication |
| G1 MSM (10) | 0.4 ms | 0.5 ms | Multi-scalar mult |
| G1 MSM (100) | 2.5 ms | 3.2 ms | Pippenger’s algorithm |
| G1 MSM (1000) | 18 ms | 22 ms | Batch verification |
G2 Operations
| Operation | M1 Pro | i9-12900K | Notes |
|---|---|---|---|
| G2 Add | 35 μs | 45 μs | Extension field |
| G2 Double | 25 μs | 32 μs | Extension field |
| G2 Mul | 160 μs | 200 μs | Scalar multiplication |
| G2 MSM (10) | 1.2 ms | 1.5 ms | Multi-scalar mult |
| G2 MSM (100) | 8 ms | 10 ms | Pippenger’s algorithm |
Pairing Operations
| Operation | M1 Pro | i9-12900K | Notes |
|---|---|---|---|
| Single Pairing | 0.9 ms | 1.2 ms | e(P, Q) |
| Pairing Check (2) | 1.5 ms | 2.0 ms | Signature verification |
| Pairing Check (4) | 2.2 ms | 3.0 ms | Batch check |
| Final Exponentiation | 0.4 ms | 0.5 ms | Part of pairing |
| Miller Loop | 0.5 ms | 0.6 ms | Part of pairing |
Hash-to-Curve
| Operation | M1 Pro | i9-12900K | Notes |
|---|---|---|---|
| Hash to G1 | 120 μs | 150 μs | RFC 9380 |
| Hash to G2 | 280 μs | 350 μs | RFC 9380 |
Signature Operations
Single Signature
| Operation | Time | Throughput |
|---|---|---|
| Sign | 180 μs | 5,500/sec |
| Verify | 1.2 ms | 830/sec |
Aggregated Signatures
| Signers | Aggregate | Verify | vs Individual |
|---|---|---|---|
| 10 | 0.1 ms | 1.3 ms | 9x faster |
| 100 | 0.8 ms | 1.5 ms | 80x faster |
| 1000 | 7 ms | 3 ms | 400x faster |
| 10000 | 70 ms | 20 ms | 600x faster |
Batch Verification
Random linear combination batch verification:| Signatures | Naive | Batched | Speedup |
|---|---|---|---|
| 10 | 12 ms | 3 ms | 4x |
| 100 | 120 ms | 12 ms | 10x |
| 1000 | 1.2 s | 50 ms | 24x |
Comparison with Other Curves
vs BN254
| Operation | BLS12-381 | BN254 | Ratio |
|---|---|---|---|
| G1 Mul | 80 μs | 45 μs | 1.8x slower |
| G2 Mul | 200 μs | 120 μs | 1.7x slower |
| Pairing | 1.2 ms | 0.6 ms | 2x slower |
| Security | 128-bit | ~100-bit | Higher |
vs secp256k1
| Operation | BLS12-381 | secp256k1 | Ratio |
|---|---|---|---|
| Sign | 180 μs | 50 μs | 3.6x slower |
| Verify | 1.2 ms | 80 μs | 15x slower |
| Aggregate (1000) | 7 ms | N/A | Unique feature |
Optimization Strategies
Multi-Scalar Multiplication (MSM)
Pippenger’s algorithm for large MSMs:| Points | Naive | Pippenger | Speedup |
|---|---|---|---|
| 100 | 8 ms | 2.5 ms | 3.2x |
| 1000 | 80 ms | 18 ms | 4.4x |
| 10000 | 800 ms | 120 ms | 6.7x |
Batch Pairing
Multi-pairing is more efficient than individual pairings:Precomputation Tables
For fixed-base multiplication (e.g., generator):Memory Requirements
| Structure | Size | Notes |
|---|---|---|
| G1 Point (compressed) | 48 bytes | |
| G1 Point (uncompressed) | 96 bytes | |
| G2 Point (compressed) | 96 bytes | |
| G2 Point (uncompressed) | 192 bytes | |
| Scalar (Fr) | 32 bytes | |
| Public Key | 48 bytes | G1 compressed |
| Signature | 96 bytes | G2 compressed |
| Aggregated Signature | 96 bytes | Same as single |
Ethereum Beacon Chain
| Data | Per Epoch | Storage |
|---|---|---|
| Attestations (naive) | ~100 MB | N/A |
| Attestations (aggregated) | ~1 MB | 99% reduction |
| Sync committee sigs | 96 bytes | Fixed |
Profiling Tips
Hotspots
Typical time distribution in signature verification:| Component | Time |
|---|---|
| Hash to G2 | 25% |
| Miller loop | 45% |
| Final exponentiation | 30% |
Optimization Priorities
- Batch operations - Use MSM and multi-pairing
- Precomputation - Cache generator multiples
- Aggregation - Combine signatures before verification
- Parallelization - Miller loops are independent
Hardware Acceleration
x86_64 (ADX/BMI2)
BLST uses:- MULX for carry-less multiplication
- ADCX/ADOX for parallel add-with-carry
- ~30% speedup over generic implementation
ARM64 (NEON)
BLST uses:- Vector operations for field arithmetic
- ~25% speedup over generic
GPU Acceleration
For large MSMs (>10,000 points):- CUDA implementations available
- ~100x speedup for MSM operations
- Not suitable for latency-sensitive signing
Related
- BLS12-381 Overview - Curve fundamentals
- Security - Security considerations
- Usage Patterns - Implementation patterns

