Skip to main content

Try it Live

Run BLS12-381 examples in the interactive playground

Performance

Benchmarks and optimization strategies for BLS12-381 operations.

Native Benchmarks (BLST)

Measured on Apple M1 Pro (ARM64) and Intel i9-12900K (x86_64):

G1 Operations

OperationM1 Proi9-12900KNotes
G1 Add12 μs15 μsPoint addition
G1 Double8 μs10 μsPoint doubling
G1 Mul65 μs80 μsScalar multiplication
G1 MSM (10)0.4 ms0.5 msMulti-scalar mult
G1 MSM (100)2.5 ms3.2 msPippenger’s algorithm
G1 MSM (1000)18 ms22 msBatch verification

G2 Operations

OperationM1 Proi9-12900KNotes
G2 Add35 μs45 μsExtension field
G2 Double25 μs32 μsExtension field
G2 Mul160 μs200 μsScalar multiplication
G2 MSM (10)1.2 ms1.5 msMulti-scalar mult
G2 MSM (100)8 ms10 msPippenger’s algorithm

Pairing Operations

OperationM1 Proi9-12900KNotes
Single Pairing0.9 ms1.2 mse(P, Q)
Pairing Check (2)1.5 ms2.0 msSignature verification
Pairing Check (4)2.2 ms3.0 msBatch check
Final Exponentiation0.4 ms0.5 msPart of pairing
Miller Loop0.5 ms0.6 msPart of pairing

Hash-to-Curve

OperationM1 Proi9-12900KNotes
Hash to G1120 μs150 μsRFC 9380
Hash to G2280 μs350 μsRFC 9380

Signature Operations

Single Signature

OperationTimeThroughput
Sign180 μs5,500/sec
Verify1.2 ms830/sec

Aggregated Signatures

SignersAggregateVerifyvs Individual
100.1 ms1.3 ms9x faster
1000.8 ms1.5 ms80x faster
10007 ms3 ms400x faster
1000070 ms20 ms600x faster

Batch Verification

Random linear combination batch verification:
SignaturesNaiveBatchedSpeedup
1012 ms3 ms4x
100120 ms12 ms10x
10001.2 s50 ms24x

Comparison with Other Curves

vs BN254

OperationBLS12-381BN254Ratio
G1 Mul80 μs45 μs1.8x slower
G2 Mul200 μs120 μs1.7x slower
Pairing1.2 ms0.6 ms2x slower
Security128-bit~100-bitHigher

vs secp256k1

OperationBLS12-381secp256k1Ratio
Sign180 μs50 μs3.6x slower
Verify1.2 ms80 μs15x slower
Aggregate (1000)7 msN/AUnique feature

Optimization Strategies

Multi-Scalar Multiplication (MSM)

Pippenger’s algorithm for large MSMs:
Complexity: O(n / log n) group operations
PointsNaivePippengerSpeedup
1008 ms2.5 ms3.2x
100080 ms18 ms4.4x
10000800 ms120 ms6.7x

Batch Pairing

Multi-pairing is more efficient than individual pairings:
// Single pairing check
e(P1, Q1) == e(P2, Q2)

// Optimized as multi-pairing
e(P1, Q1) * e(-P2, Q2) == 1

// Further optimized with shared final exponentiation
miller(P1, Q1) * miller(-P2, Q2) -> final_exp

Precomputation Tables

For fixed-base multiplication (e.g., generator):
// Precompute multiples of generator
const TABLE_SIZE = 256;
var precomputed: [TABLE_SIZE]G1Point = undefined;
precomputed[0] = G1.identity();
precomputed[1] = G1.generator();
for (2..TABLE_SIZE) |i| {
    precomputed[i] = G1.add(precomputed[i-1], precomputed[1]);
}

// Fast multiplication using table
fn mulGenerator(scalar: Fr) G1Point {
    // Use precomputed table for significant speedup
    // ~4x faster than naive double-and-add
}

Memory Requirements

StructureSizeNotes
G1 Point (compressed)48 bytes
G1 Point (uncompressed)96 bytes
G2 Point (compressed)96 bytes
G2 Point (uncompressed)192 bytes
Scalar (Fr)32 bytes
Public Key48 bytesG1 compressed
Signature96 bytesG2 compressed
Aggregated Signature96 bytesSame as single

Ethereum Beacon Chain

DataPer EpochStorage
Attestations (naive)~100 MBN/A
Attestations (aggregated)~1 MB99% reduction
Sync committee sigs96 bytesFixed

Profiling Tips

Hotspots

Typical time distribution in signature verification:
ComponentTime
Hash to G225%
Miller loop45%
Final exponentiation30%

Optimization Priorities

  1. Batch operations - Use MSM and multi-pairing
  2. Precomputation - Cache generator multiples
  3. Aggregation - Combine signatures before verification
  4. Parallelization - Miller loops are independent

Hardware Acceleration

x86_64 (ADX/BMI2)

BLST uses:
  • MULX for carry-less multiplication
  • ADCX/ADOX for parallel add-with-carry
  • ~30% speedup over generic implementation

ARM64 (NEON)

BLST uses:
  • Vector operations for field arithmetic
  • ~25% speedup over generic

GPU Acceleration

For large MSMs (>10,000 points):
  • CUDA implementations available
  • ~100x speedup for MSM operations
  • Not suitable for latency-sensitive signing