Matching Engine Order Throughput
Sept 2025
Overview
In this experiment we analyze Ember throughput. More specifically we will measure maximum sustained rate of order requests before system will exhibits signs of overload and identify current bottleneck.
Result summary: in matching engine scenarios single Ember OEMS instance is limited to around 430 K order messages per second of sustained load.
Experiment design
In this experiment, we repurpose our Ember latency test framework, which auto-provisions Linux servers on AWS. The main server hosts Ember with the FIX gateway and matching engine, while additional servers run bots simulating FIX clients that send order requests.
By gradually increasing the number of client servers, we drive higher order rates. The framework then measures order acknowledgement latency at each request rate.
In this realistic test, order requests are simulated by N bots. Each bot submits a mix of BUY and SELL orders, consisting of MARKET and LIMIT types, with prices drawn from a normal distribution to ensure a natural balance of aggressive and passive orders. As a result, the system produces a realistic combination of ACKs, FILLS, and REJECTS.
During this test, we perform a series of runs, each consisting of a warm-up phase followed by a 10-minute measurement period during which latency data is collected. For subsequent runs, either bot rate or the number of bots (N) is increased from 10 to 20, 30, and 40, resulting in a higher overall request rate. Full order request load is split between multiple parallel bots and between multiple client servers to avoid any suspicion that client side could be a bottleneck.
Latency is measured using the LIBPCAP method, which captures packets directly at the network interface, as described in the Matching Engine Latency Test article.
It represents the packet-to-packet time difference between the inbound network packet containing a FIX order request (35=D or 35=F) and the outbound packet containing the corresponding order acknowledgment (typically 35=8).
Environment
- AWS Cloud
- Ember running on c5d.12xlarge (with Intel Xeon Platinum 8275CL CPU @ 3.00GHz)
- Tuned version of Amazon Linux AMI
- OpenJDK 17.0.12 (Amazon Corretto)
- Ember 1.14.212 running under docker-compose
- Ember JVM options:
-Xmx8G
-Xms8G
-XX:-TieredCompilation
-XX:+UnlockDiagnosticVMOptions
-XX:+UseSerialGC
-XX:AutoBoxCacheMax=10000
-XX:GuaranteedSafepointInterval=0
-Ddeltix.disable.bounds.check=true
-Dlatency.test=false
-Daeron.socket.so_sndbuf=2m
-Daeron.socket.so_rcvbuf=2m
-Dagrona.disable.bounds.checks=true
NM Matching engine was used as test target.
Results
Request Rate (orders/s) | P50 (µs) | P90 (µs) | P99 (µs) | P999 (µs) | P9999 (µs) | P99999 (µs) |
---|---|---|---|---|---|---|
50K | 15 | 17 | 22 | 31 | 268 | 623 |
125K | 20 | 27 | 35 | 50 | 314 | 677 |
165K | 23 | 32 | 42 | 66 | 428 | 1033 |
191K | 27 | 40 | 54 | 118 | 509 | 2260 |
198K | 30 | 44 | 60 | 143 | 572 | 2300 |
216K | 181 | 235 | 308 | 391 | 1873 | 4130 |
241K ⚠ | ||||||
250K ⚠ |
⚠ 241K = first intermittent overloads
⚠ 250K = frequent overloads
Analysis
After we exceed 215K rps rate we observed OMS overload visible as intermittent "Service Proxy overload" errors in Gateway algorithm and raise of Inbound Failed Offers counter metric, this effect became much more pronounced at 250K.
The latency profile remains stable up to roughly 215K requests per second, with P50–P99 latencies staying in the tens of microseconds. At around 240K RPS, the system enters overload: the FIX Gateway backpressure mechanism begins rejecting requests rather than enqueueing them in the OMS. This indicates a practical capacity ceiling between 216K and 240K RPS. Beyond this point, order processing falls behind, resulting in queue buildup and sharp latency degradation at higher percentiles.
In this specific case full OMS throughput was twice as much as order request rate. For each order request matching engine would issue some kind of ACK event (which was recorded by OMS and sent to order request origin - FIX client bots in this case).
We profiled the order flow under overload conditions and generated a flamegraph. Analysis showed that while Ember’s system-level components (journaling and FIX gateway pipelines) are capable of handling higher throughput, the business logic in the central OEMS component has become the current bottleneck. At the time of writing, Ember has been in production for over five years, and like any mature OEMS system, it has accumulated some feature bloat. These bottlenecks are being targeted for optimization in upcoming releases.
Bottom line: This test shows that system is capable of sustaining request rates up to 430,000 messages per second**, as confirmed by long-duration stability tests. In addition, like any event-processing system, Ember benefits from elasticity provided by multiple internal queues. By tuning queue sizes, the system can absorb higher short-term rate spikes (lasting seconds to minutes) without degradation.
Appendix: raw test results
I am conducting a set of performance tests where I gradually increase system load for our trading system (Ember) and measure latency observed over specific period of time (10 minutes).
For example for 50K order requests per second, we see the following order acknowledgement latency stats:
Percentile : Microseconds : Event count
MIN : 12.0 : 263
50.0% : 15.0 : 2499193
90.0% : 17.0 : 3435138
99.0% : 22.0 : 3619706
99.9% : 31.0 : 3644606
99.99% : 268.0 : 3647696
99.999% : 623.0 : 3648021
99.9999% : 906.0 : 3648053
99.99999% : 1023.0 : 3648057
MAX, TOTAL : 1023.0 : 3648057
Ignore third column. The left and center columns show us a specific percentile and latency value for that percentile in microseconds.
I want you to aggregate statistics for some percentiles (P50, P90, P99, P999) so that we can compare how these depend on request rate.
So the first set is 50K requests per second and histogram above.
Latency at 125K order requests per second:
==========================================
Percentile : Microseconds : Event count
MIN : 12.0 : 438
50.0% : 20.0 : 2959876
90.0% : 27.0 : 4706027
99.0% : 35.0 : 5072137
99.9% : 50.0 : 5110284
99.99% : 314.0 : 5114737
99.999% : 677.0 : 5115198
99.9999% : 824.0 : 5115242
99.99999% : 885.0 : 5115246
99.999999% : 923.0 : 5115247
MAX, TOTAL : 923.0 : 5115247
Latency at 165K order requests per second:
==========================================
Percentile : Microseconds : Event count
MIN : 12.0 : 7
50.0% : 23.0 : 2450009
90.0% : 32.0 : 4071573
99.0% : 42.0 : 4441421
99.9% : 66.0 : 4478764
99.99% : 428.0 : 4482749
99.999% : 1033.0 : 4483152
99.9999% : 2403.0 : 4483193
99.99999% : 2962.0 : 4483197
99.999999% : 2962.0 : 4483197
MAX, TOTAL : 2962.0 : 4483197
Latency at 191K order requests per second:
==========================================
Percentile : Microseconds : Event count
MIN : 13.0 : 150
50.0% : 27.0 : 2036083
90.0% : 40.0 : 3537017
99.0% : 54.0 : 3862152
99.9% : 118.0 : 3895391
99.99% : 509.0 : 3898896
99.999% : 2260.0 : 3899243
99.9999% : 3459.0 : 3899278
99.99999% : 3836.0 : 3899282
99.999999% : 3836.0 : 3899282
MAX, TOTAL : 3836.0 : 3899282
Latency at 198K order requests per second:
==========================================
Percentile : Microseconds : Event count
MIN : 13.0 : 18
50.0% : 30.0 : 2029909
90.0% : 44.0 : 3633137
99.0% : 60.0 : 3983205
99.9% : 143.0 : 4013985
99.99% : 572.0 : 4017592
99.999% : 2300.0 : 4017955
99.9999% : 3110.0 : 4017990
99.99999% : 3353.0 : 4017994
99.999999% : 3353.0 : 4017994
MAX, TOTAL : 3353.0 : 4017994
Latency at 216K order requests per second:
==========================================
Percentile : Microseconds : Event count
MIN : 14.0 : 1
50.0% : 181.0 : 2212798
90.0% : 235.0 : 3978073
99.0% : 308.0 : 4365819
99.9% : 391.0 : 4405314
99.99% : 1873.0 : 4409281
99.999% : 4130.0 : 4409678
99.9999% : 4538.0 : 4409718
99.99999% : 5055.0 : 4409722
99.999999% : 5055.0 : 4409722
MAX, TOTAL : 5055.0 : 4409722
Latency at 230K order requests per second:
==========================================
Percentile : Microseconds : Event count
MIN : 14.0 : 15
50.0% : 46.0 : 2357424
90.0% : 72.0 : 4241452
99.0% : 96.0 : 4646043
99.9% : 216.0 : 4684706
99.99% : 678.0 : 4688925
99.999% : 4290.0 : 4689347
99.9999% : 6484.0 : 4689389
99.99999% : 6781.0 : 4689394
99.999999% : 6781.0 : 4689394
MAX, TOTAL : 6781.0 : 4689394
Latency at 241K order requests per second
(Warning: first signs of intermittent overload)
=============================================
Percentile : Microseconds : Event count
MIN : 16.0 : 3
50.0% : 178.0 : 2459486
90.0% : 249.0 : 4420143
99.0% : 312.0 : 4857058
99.9% : 431.0 : 4900213
99.99% : 2064.0 : 4904626
99.999% : 4393.0 : 4905068
99.9999% : 6834.0 : 4905112
99.99999% : 8191.0 : 4905117
99.999999% : 8191.0 : 4905117
MAX, TOTAL : 8191.0 : 4905117
Latency at 250K order requests per second
=========================================
Frequent overloads
Percentile : Microseconds : Event count
MIN : 12.0 : 1
50.0% : 783.0 : 442776
90.0% : 1660.0 : 795971
99.0% : 2210.0 : 875545
99.9% : 2503.0 : 883471
99.99% : 3198.0 : 884267
99.999% : 3700.0 : 884346
99.9999% : 4152.0 : 884354
99.99999% : 4346.0 : 884355
99.999999% : 4346.0 : 884355
MAX, TOTAL : 4346.0 : 884355
At overload point Ansible latency-new test have the following settings:
# 10xClientSeverCount 2750x1 = desired rate produces around 25K/sec
ember_client_connections: 10 #how many connections to server from each client (make sure ember.conf configures enough free FIX sessions)
ember_client_trade_extra_args: -rate 6750 -threads 1 -affinity 4,5 -activeOrdersWindow 2000 # Assuming we run with kernel ISOLCPU: 4-7,12-15
ember_client_market_data_extra_args: -threads 2 -affinity 6,7
variable "instance_count" {
type = map(number)
default = {
"ember_server" = 1
"ember_client" = 4
}
}