28Stone Benchmarking High-Frequency Trading on Google Cloud

NEWSAugust 25, 2025

28Stone Benchmarking High-Frequency Trading on Google Cloud: Can cloud-based trading systems achieve the performance of on-prem solutions?

Over the past few years, financial exchange operators and electronic marketplaces providers have increasingly embraced the cloud. There is definitive momentum towards migration of these systems to the cloud, with several exchanges running in the cloud already and more already planned.

As the exchanges move to the cloud, the interfacing trading platforms face a choice. Active market participants currently enjoy low and predictable latency to exchanges through co-location and direct market access solutions. Sophisticated firms deploy a wide range of performant solutions, ranging from custom ASICS to proprietary operating systems on x86 architectures. Is it even possible to achieve the performance profiles of these systems using cloud native technologies and infrastructure?

Calling All Globex Customers

Get ready to migrate to Google Cloud with 28Stone

Prepare for the Globex migration, including 2026 requirements
Learn about the ultra-low latency achieved in the cloud with CME Group, Google Cloud and 28Stone
Explore why cloud is a viable option vs. on-prem for high-performance trading – expanding the opportunity for your firm to modernize trading

28Stone is here to help! Contact us, and get our Globex Cloud Migration Preparedness Guide.

Download Globex Migration Guide

The Tick-To-Trade Benchmark: Pushing the Limits of Trading on Google Cloud

We designed a series of rigorous benchmarks to evaluate Google Cloud's capabilities for tick-to-trade workflows. The core goal was to see if we could match the low-latency and predictable performance of co-located and direct market access solutions using native cloud technologies. Our testing focused on a simplified version of a tick-to-trade workflow, which involves:

Decoding: Converting an incoming binary market data feed into a domain object.
Decision: Applying a simple rule to determine if a trade is feasible. To eliminate timing variability of trade decision algorithms, we used a modulus operator on the sequence number of the message.
Construction: If the trade condition is met, construct an outbound order message in SBE FIX format.
Sending: Transmitting the order message.

This process effectively captures the time required to process a market signal and send a trade instruction to an exchange.

All benchmarking leverage PCAP files represent actual production market data sourced from CME Group Datamine for CBOT Globex Equity Futures for the 08/28/2024 trading date. These files had 6,840,868 packets used in our analysis.

How We Ran The Tests To Benchmark Trading On Google Cloud

For our tick-to-trade benchmarking, we conducted a series of tests across different dimensions to evaluate performance on Google Cloud Platform's (GCP) C3 machine series. Our test matrix focused on several key factors: the GCP instance type, the packet ingress mechanism, the level of parallelism, and the replay speed of the market data.

Our performance engineering approach centered on two main metrics: the elapsed time for individual packet processing and the aggregate throughput of packets. The test setup was simple yet powerful, consisting of two Google Compute Engine instances: a packet replay instance and a test instance.

We tested two distinct packet-processing approaches to measure performance:

Kernel Space Packet Processing: This is the "out of the box" baseline, using standard POSIX socket interfaces and the underlying kernel's network stack.
Kernel Bypass Packet Processing: This advanced technique bypasses the kernel, shifting the workload to user space for optimized processing. For these tests, we also used CPU-pinning and ring buffers to enable parallel processing and further improve performance.

Our tests were run across various GCP instance types, with varying numbers of vCPUs, and at different replay speeds to simulate market conditions far beyond typical CME market throughputs.

The in-process measurements recorded the time from when a packet was read into user space for decoding until the FIX order message was placed on the transmit queue. The round-trip duration was measured from the moment the market data packet left the replay instance’s NIC until the order message packet completed its round-trip back to the same NIC. This dual approach allowed us to separate the fixed cost of packet transmission from the in-process software optimizations.

MIGRATION READINESS ASSESSMENTUnderstand Your Globex Migration Readiness 28Stone's Globex Migration Readiness Assessment is a quick, tailored service designed to evaluate and quantify your firm's preparedness for the migration coming online in 2028.
Learn more

The Results: Fast and Consistent Performance

The results of our tests confirmed that a GCP-hosted infrastructure can deliver high-performance, low-latency data processing. The C++ platform, optimized with low-level techniques like kernel bypass packet processing and CPU-pinning, consistently outperformed other platforms by a significant margin.

Round-Trip Times

Our round-trip time measurements, from the moment a packet left the replay instance's NIC until the response arrived back at the same NIC, showed impressive results. For the C++ platform with kernel bypass on a C3-standard-176 instance, the median (P50) round-trip latency was just 26 µs at 1x replay speed. Even at 100x replay speed, simulating massive data volumes, the median RTT remained an astonishingly low 23 µs. This demonstrates remarkable consistency and performance, even under extreme load.

In-Process Latency

We also measured the in-process latency—the time a packet spent inside our application, from the moment it was received to the moment the order was ready to be sent. The results were even more compelling. For the C++ platform with kernel bypass, the in-process latency was consistently low: in the range of 0.3 to 2.5µs. This highlights the efficiency of our optimized in-process logic and the value of kernel bypass networking for cloud applications.

The Conclusion: The Future of Trading is on the Cloud

Our benchmarks prove that high-performance, low-latency financial workflows are entirely feasible on standard cloud compute instances. The C++ platform, enhanced with kernel bypass and CPU-pinning, provides a robust foundation for building sophisticated trading systems. This capability eliminates the need for expensive, specialized hardware and proprietary toolkits, leveling the playing field for a wide variety of firms.

To dive deeper into the technical details and see the full set of results, download the full whitepaper, Tick-To-Trade GCP Benchmarking Summary.

WHITEPAPER

The Next Generation of Speed: How GCP's C4 Instances Redefine Ultra-Low Latency Trading In The Cloud

Following the success and interest in our C3 benchmarking project, 28Stone is releasing more data that demonstrates an important shift in capital markets: cloud infrastructure is now a viable alternative for ultra-low latency trading.

GUIDE

CME Globex Migration to Google Cloud: Preparing for Your 2028 Trading Infrastructure Opportunity

CME Globex is migrating its matching engines to Google Cloud Platform's specialized private regions. We developed our Globex Migration Preparedness Guide to help you navigate this shift.