Documentation Index
Fetch the complete documentation index at: https://voltaire.tevm.sh/llms.txt
Use this file to discover all available pages before exploring further.
Try it Live
Run BLS12-381 examples in the interactive playground
Performance
Benchmarks and optimization strategies for BLS12-381 operations.
Native Benchmarks (BLST)
Measured on Apple M1 Pro (ARM64) and Intel i9-12900K (x86_64):
G1 Operations
| Operation | M1 Pro | i9-12900K | Notes |
|---|
| G1 Add | 12 μs | 15 μs | Point addition |
| G1 Double | 8 μs | 10 μs | Point doubling |
| G1 Mul | 65 μs | 80 μs | Scalar multiplication |
| G1 MSM (10) | 0.4 ms | 0.5 ms | Multi-scalar mult |
| G1 MSM (100) | 2.5 ms | 3.2 ms | Pippenger’s algorithm |
| G1 MSM (1000) | 18 ms | 22 ms | Batch verification |
G2 Operations
| Operation | M1 Pro | i9-12900K | Notes |
|---|
| G2 Add | 35 μs | 45 μs | Extension field |
| G2 Double | 25 μs | 32 μs | Extension field |
| G2 Mul | 160 μs | 200 μs | Scalar multiplication |
| G2 MSM (10) | 1.2 ms | 1.5 ms | Multi-scalar mult |
| G2 MSM (100) | 8 ms | 10 ms | Pippenger’s algorithm |
Pairing Operations
| Operation | M1 Pro | i9-12900K | Notes |
|---|
| Single Pairing | 0.9 ms | 1.2 ms | e(P, Q) |
| Pairing Check (2) | 1.5 ms | 2.0 ms | Signature verification |
| Pairing Check (4) | 2.2 ms | 3.0 ms | Batch check |
| Final Exponentiation | 0.4 ms | 0.5 ms | Part of pairing |
| Miller Loop | 0.5 ms | 0.6 ms | Part of pairing |
Hash-to-Curve
| Operation | M1 Pro | i9-12900K | Notes |
|---|
| Hash to G1 | 120 μs | 150 μs | RFC 9380 |
| Hash to G2 | 280 μs | 350 μs | RFC 9380 |
Signature Operations
Single Signature
| Operation | Time | Throughput |
|---|
| Sign | 180 μs | 5,500/sec |
| Verify | 1.2 ms | 830/sec |
Aggregated Signatures
| Signers | Aggregate | Verify | vs Individual |
|---|
| 10 | 0.1 ms | 1.3 ms | 9x faster |
| 100 | 0.8 ms | 1.5 ms | 80x faster |
| 1000 | 7 ms | 3 ms | 400x faster |
| 10000 | 70 ms | 20 ms | 600x faster |
Batch Verification
Random linear combination batch verification:
| Signatures | Naive | Batched | Speedup |
|---|
| 10 | 12 ms | 3 ms | 4x |
| 100 | 120 ms | 12 ms | 10x |
| 1000 | 1.2 s | 50 ms | 24x |
Comparison with Other Curves
vs BN254
| Operation | BLS12-381 | BN254 | Ratio |
|---|
| G1 Mul | 80 μs | 45 μs | 1.8x slower |
| G2 Mul | 200 μs | 120 μs | 1.7x slower |
| Pairing | 1.2 ms | 0.6 ms | 2x slower |
| Security | 128-bit | ~100-bit | Higher |
vs secp256k1
| Operation | BLS12-381 | secp256k1 | Ratio |
|---|
| Sign | 180 μs | 50 μs | 3.6x slower |
| Verify | 1.2 ms | 80 μs | 15x slower |
| Aggregate (1000) | 7 ms | N/A | Unique feature |
Optimization Strategies
Multi-Scalar Multiplication (MSM)
Pippenger’s algorithm for large MSMs:
Complexity: O(n / log n) group operations
| Points | Naive | Pippenger | Speedup |
|---|
| 100 | 8 ms | 2.5 ms | 3.2x |
| 1000 | 80 ms | 18 ms | 4.4x |
| 10000 | 800 ms | 120 ms | 6.7x |
Batch Pairing
Multi-pairing is more efficient than individual pairings:
// Single pairing check
e(P1, Q1) == e(P2, Q2)
// Optimized as multi-pairing
e(P1, Q1) * e(-P2, Q2) == 1
// Further optimized with shared final exponentiation
miller(P1, Q1) * miller(-P2, Q2) -> final_exp
Precomputation Tables
For fixed-base multiplication (e.g., generator):
// Precompute multiples of generator
const TABLE_SIZE = 256;
var precomputed: [TABLE_SIZE]G1Point = undefined;
precomputed[0] = G1.identity();
precomputed[1] = G1.generator();
for (2..TABLE_SIZE) |i| {
precomputed[i] = G1.add(precomputed[i-1], precomputed[1]);
}
// Fast multiplication using table
fn mulGenerator(scalar: Fr) G1Point {
// Use precomputed table for significant speedup
// ~4x faster than naive double-and-add
}
Memory Requirements
| Structure | Size | Notes |
|---|
| G1 Point (compressed) | 48 bytes | |
| G1 Point (uncompressed) | 96 bytes | |
| G2 Point (compressed) | 96 bytes | |
| G2 Point (uncompressed) | 192 bytes | |
| Scalar (Fr) | 32 bytes | |
| Public Key | 48 bytes | G1 compressed |
| Signature | 96 bytes | G2 compressed |
| Aggregated Signature | 96 bytes | Same as single |
Ethereum Beacon Chain
| Data | Per Epoch | Storage |
|---|
| Attestations (naive) | ~100 MB | N/A |
| Attestations (aggregated) | ~1 MB | 99% reduction |
| Sync committee sigs | 96 bytes | Fixed |
Profiling Tips
Hotspots
Typical time distribution in signature verification:
| Component | Time |
|---|
| Hash to G2 | 25% |
| Miller loop | 45% |
| Final exponentiation | 30% |
Optimization Priorities
- Batch operations - Use MSM and multi-pairing
- Precomputation - Cache generator multiples
- Aggregation - Combine signatures before verification
- Parallelization - Miller loops are independent
Hardware Acceleration
x86_64 (ADX/BMI2)
BLST uses:
- MULX for carry-less multiplication
- ADCX/ADOX for parallel add-with-carry
- ~30% speedup over generic implementation
ARM64 (NEON)
BLST uses:
- Vector operations for field arithmetic
- ~25% speedup over generic
GPU Acceleration
For large MSMs (>10,000 points):
- CUDA implementations available
- ~100x speedup for MSM operations
- Not suitable for latency-sensitive signing