Data Warehouse Pipelines

Overview

Ember uses a highly efficient storage system called Ember Journal to store all trading messages, including order requests and order events. Ember can process more than 100,000 orders per second. At this rate, one gigabyte of trading history data can be accumulated every minute.

While Ember is running, data warehouse pipelines are responsible for streaming or batching trading history to various destinations suitable for open data analytics and permanent storage. Ember supports the following data warehouses:

Data warehouses work in coordination with the Ember Journal Compactor to ensure that the operational storage size remains compact and all trading history is preserved.

Data warehouse conceptual diagram

Ember retains an operational subset of data in memory, stores recent trading data in the journal, and streams all data to data warehouses, where it can be stored indefinitely. From this perspective, neither the Ember API nor the Ember Journal is the optimal place to retrieve information like "show me all trades for today." Instead Ember delegates this task to data warehouses.

This design offers several advantages:

The Ember Journal can be optimized for rapid sequential data insertion.
The operational dataset can stay small.
Reporting queries don't overwhelm Ember RPC channels.

Different warehouses can be set up to run in parallel.

Comparing Data Warehouses

Here's a comparison of the various data warehouses:

	TimeBase	ClickHouse	Kafka	S3	RedShift	RDS SQL
Max Rate (orders/sec) Sustained	500K +	200K	TBD	15K	250	50
Reports Performance		Very Good		Adequate
Query Language	QQL (Limited)	SQL subset Very Good	KQL	Athena uses Presto SQL	SQL subset
Maintenance Effort	Medium-High	High	Medium	Very Low	Low	Low
Storage Cost	High	High	High	Low	High	High
GUI Client	TimeBase Admin	Tabbix	KafkaTool, etc.	AWS Athena Console	Any SQL client	Any SQL client

For a more detailed description of data warehouse configuration, visit the Ember Configuration Guide.

Overview​

Comparing Data Warehouses​

Overview

Comparing Data Warehouses