Streaming Orders into AWS RedShift
Overview
This document explains how to configure and use AWS RedShift, Amazon’s fully managed petabyte scale data warehouse, for the Execution Server (ES).
Deltix recommends this warehouse to clients who have a relatively small order rate (few thousands per second) and want the simplicity of an AWS managed database with the analytical power of RedShift.
Please be warned that RedShift is not very good at storing elevated rates of trading activity. See the Performance section for more information.
RedShift Setup
To launch a RedShift cluster, follow the description in the AWS RedShift tutorials.
Practical Considerations
Disk Space
Storage for one order takes approximately 700 - 1000 bytes. Multiple cancellation attempts or an unusually high amount of trades per order can increase the order footprint.
Please make sure you select the appropriate storage sizes. We recommend starting with 1TB.
VPC Network
For improved performance, consider running your RedShift cluster in the same VPC as Ember.
If you plan to use the RedShift Query Editor, you need to use an Advanced VPC configuration.
Make sure AWS VPC settings would allow the Ember data warehouse to connect to your RedShift cluster.
Create a Cluster
To create a cluster in the AWS RedShift console, follow these steps:
Use Create Cluster.
On the Cluster Details tab, fill out the fields.
On the Node Configuration tab, select a node type.
On the Additional Configuration tab, select Enhanced VPC and the correct VPC.
You can edit the VPC configuration to enable access to the RedShift port later on.
If you set up a test database and test it over the Internet, in the network settings, enable public access.
Once the database cluster starts up, on the RedShift Cluster Configuration tab, look up "JDBC URL" and copy the cluster hostname.
You will need to enter it into Ember’s data warehouse streaming configuration.
Configure the Execution Server
The Execution Server can export data into RedShift in two modes:
- Live Mode: A special daemon service exports data in near real-time.
- Batch Mode: A periodic process exports all recently accumulated data in batches.
In the configuration fragment below, the highlighted portions reflect the RedShift cluster created in the previous section. Here we also show how to control read and write batch sizes to improve tool responsiveness.
If you want to limit streaming to either the Orders block or the Messages block, only include the one you want to use.
To configure the Execution Server, add the following sections to $EMBER_HOME/ember.conf.
warehouse {
redshift { # unit id, you will use it when you run the app, it might be any
live = true # keep checking for new messages when end of journal is reached
readBatchLimit = 10 # reduce batch size - redshift is slow
messages = [
${template.warehouse.redshift.messages} { # loader which loads order messages
loader.settings {
host = "ember-redshift-cluster-1.cau6u2ckbpk1.us-east-2.redshift.amazonaws.com"
port = 5439
username = "awsuser"
password = "data#WAREZ#1"
databaseName = "ember"
tableName = "messages"
createDatabase = true # create database if not exists
createTable = true # create table if not exists
dropTable = false # drop table if exists
batchLimit = 10
}
}
]
orders = [
${template.warehouse.redshift.orders} { # loader which loads closed orders
loader.settings {
host = "ember-redshift-cluster-1.cau6u2ckbpk1.us-east-2.redshift.amazonaws.com"
port = 5439
username = "awsuser"
password = "data#WAREZ#1"
databaseName = "ember"
tableName = "orders"
createDatabase = true
createTable = true
dropTable = false
batchLimit = 500
}
}
]
}
}
For production setups, store connection passwords in a Hashicorp Vault or hash them using the Mangle tool. For more information on storing passwords, refer to the Ember Configuration Guide.
Start Exporting Data
To begin streaming Execution Server data into RedShift, use the data-warehouse
service. This service reads the Ember journal and converts all trading messages and completed orders into RedShift Messages and Orders tables.
Active orders are not exported until they are complete (completely filled, cancelled, or rejected). However, messages concerning active orders are exported immediately.
Run Ember's data-warehouse
script with single argument that specifies the RedShift data warehouse:
export EMBER_HOME=/deltix/emberhome
/deltix/ember/bin/data-warehouse redshift
In batch mode, this script exits as soon as all recent data exports. If you re-run the script, it appends any new data that the Execution Server accumulated since the last export.
Verify Installation
To verify your setup, use the AWS RedShift Query Editor:
You should see some content in the Messages table.
Message Identity
Records in Orders and Messages can be identified using the composite key {Term
, Sequence
}.
Sequence
comes from the upstream Ember Order Management System (OMS).- Each message processed by the OMS is assigned a unique sequence number.
- Not all messages are stored into the data warehouse. For example, some internal system control messages are skipped. Hence, you may see gaps in sequence numbers which otherwise increase monotonously.
- System operators may periodically clear Ember’s journal. This action restarts message sequence (and increases
Term
). When this happens, new messages start at a different term.
Term
identifies journal creation time.Term
remains the same for all messages written since Ember journal creation.- Each time the journal is cleared and re-created, the value specified as
Term
increases.
Example
Term | Sequence | Data |
---|---|---|
1546300800 | 1 | OrderNewRequest |
1546300800 | 3 | OrderNewEvent |
... | ... | ... |
1546300800 | 413891 | The last message before journal is reset |
Operator resets Ember’s journal (will result in new term and message sequence reset) | ||
1561939200 | 1 | OrderCancelRequest |
1561939200 | 2 | OrderCancelRequest |
... | ... | ... |
You can rely on the fact that {Term, Sequence} are always increasing to implement the before-after ordering of messages.
Performance
Amazon RedShift is designed for analytics queries, rather than transaction processing.
The cost of COMMIT is relatively high. Deltix benchmarking shows that RedShift data loading is limited to about 500 messages per second. This was observed with default settings of a two-node dc2.large cluster.
On the data loading side, Deltix uses configurable batched inserts. Aggressive batching using the batchLimit
parameter helps.
The following graph shows data from the experiment where we loaded trade report events into RedShift.
The Deltix data warehouse can handle trading activity spikes, but if your sustained load exceeds 10 orders per second, you may want to consider using another data warehouse. For example, ClickHouse can handle more than 100,000 orders per second.
Appendix: Data Format
This section describes the format of the two tables used to warehouse Ember's trading history.
Orders Table
The Orders table captures the final state of each order.
Column | Type | Example | Description |
---|---|---|---|
Term | INT8 | 1546300800 | Identifies sequence term. See “Message Identity” section above. Since: Ember 1.4 |
OpenSequence | INT8 | 312321304 | Identifies sequence number of the message that created the order (usually OrderNewRequest). See “Message Identity” section above. Since: Ember 1.4 |
CloseSequence | INT8 | 312321312 | Identifies sequence number of the message that completed (closed) the order (usually OrderCancelEvent, OrderRejectEvent, OrderTradeReportEvent, etc.). See “Message Identity” section above. Since: Ember 1.4 |
SourceId | CHAR(10) | CLIENT52 | Order source, ALPHANUMERIC(10) |
DestinationId | CHAR(10) | TWAP | Order destination, ALPHANUMERIC(10) |
OrderId | VARCHAR | ICAP1983EE | Identifies each order for Execution Server, unique per-source. OrderID is assigned by order source. |
ParentSourceId | CHAR(10) | CONTROL | Identifies source of parent order (optional) |
ParentOrderId | VARCHAR | ICAP321XX1 | Identifies parent order (optional) |
ExternalOrderId | VARCHAR | ZZ132131 | Optional order identifier assigned to the order by execution venue. For example, if we send order to execution venue, like CME they assign their own order identifier. This identifier can be subsequently used to locate this order on CME. |
Account | VARCHAR | Gold | Identifies order account |
Trader | VARCHAR | jdoe | Identifies trader who submitted this order |
Symbol | VARCHAR | EUR/USD | Order symbol (in symbology configured inside Deltix system) |
InstrumentType | VARCHAR | FX | Instrument type |
Exchange | CHAR(10) | HOTSPOT | Destination exchange (if available) |
Currency | CHAR(10) | USD | Order currency. Optional. Usually used only for orders that use term currency (rather than base currency). |
Side | VARCHAR | BUY | Order side |
TimeInForce | VARCHAR | DAY | Order time In force |
ExpireTime | TIMESTAMP | 2019-02-27 17:00:00.000 | Order expiration time (only for GOOD_TILL_DATE orders) |
OrderStatus | VARCHAR | CANCELLED | Final state of the order |
OpenTime | TIMESTAMP | 2019-02-27 16:51:48.002 | Order submission time |
CloseTime | TIMESTAMP | 2019-02-27 16:51:48.120 | Order completion time |
OrderType | VARCHAR | PEG_TO_MIDPOINT | Order type |
LimitPrice | FLOAT8 | 1.33 | Limit price. Can be specified for LIMIT, STOP_LIMIT, PEGGED, or CUSTOM order types |
StopPrice | FLOAT8 | 1.20 | Stop price. Can be specified for STOP and STOP_LIMIT order types. |
Quantity | FLOAT8 | 10000 | Order quantity |
DisplayQuantity | FLOAT8 | 1000 | Order display quantity (some time described as “max show quantity” or “max floor quantity”), where applicable. |
MinQuantity | FLOAT8 | 1000 | Minimum fill quantity (where applicable). |
CumulativeQuantity | FLOAT8 | 1500.50 | Cumulative filled quantity |
AveragePrice | FLOAT8 | 132.56 | Average fill price |
VendorRejectCode | INT4 | 1003 | Vendor specific reject code. For example CME’s. Since Ember 1.4. |
DeltixRejectCode | INT4 | 120 | Reject code in Deltix classification. Since Ember 1.4. |
Reason | VARCHAR(max) | “Cancelled by user request” | For cancelled or rejected orders this field contains textual reason. |
Messages Table
The Messages table records all order-related activity in real time.
More specifically this table records order requests (original submission, cancellation, and order modification requests) and order events (for example, order acknowledgement, cancellation confirmation, or traders).
To get a better understanding of trading workflows in the Execution Server, refer to the Trading Data Model document.
Column | Type | Example | Description |
---|---|---|---|
Type | VARCHAR | OrderTradeReportEvent | Identities type of message. See Trading Data Model for list of event types. |
Term | INT8 | 1546300800 | Identifies sequence term. See “Message Identity” section above. Since: Ember 1.4 |
Sequence | INT8 | 312321312 | Unique number that represents ES message sequence, can be used as unique synthetic timestamp. See “Message Identity” section above. |
Timestamp | TIMESTAMP | 2019-02-27 16:51:48.123 | Message timestamp |
SourceId | CHAR(10) | CLIENT52 | Order source, ALPHANUMERIC(10) |
DestinationId | CHAR(10) | TWAP | Order destination, ALPHANUMERIC(10) |
OrderId | VARCHAR | ICAP1983EE23 | Identifies order for Execution Server, unique per-source. |
OriginalOrderId | VARCHAR | ICAP1983EE22 | For order replacement request, as well as events that relate to cancel replace workflow (such as PendingReplace, ReplaceReject, and Replace ACK) this field identifies original order in cancel-replace chain. |
CorrelationOrderId | VARCHAR | ICAP1983EE00 | Identity of the first order in cancel-replace chain. Same as OrderID for orders that do not (yet) participate in cancel-replace workflow. |
ParentSourceId | CHAR(10) | CONTROL | Identifies source of parent order (optional) |
ParentOrderId | VARCHAR | ICAP321XX1 | Identifies parent order (optional) |
RequestId | VARCHAR | XCL#554 | For order cancel request, as well as cancel ACK and cancel NACK events this field identifies specific cancel request. |
ExternalOrderId | VARCHAR | ZZ132131 | Optional order identifier assigned to the order by execution venue |
EventId | VARCHAR | AAAT31231 | Optional attribute available for events coming from some venues. Allow identifying duplicate events. May have different uniqueness scope, but must be unique at least in the context of single order. NOTE: OMS is responsible for filtering out duplicate events before they reach data warehouse or other downstream consumers. |
ReferenceEventId | VARCHAR | Used by trade correction and cancellation events to identify previously communicated event that has to be corrected or cancelled. | |
OrderStatus | VARCHAR | PARTIALLY_FILLED | Order status (available for order events only). |
Symbol | VARCHAR | EUR/USD | Order symbol (in symbology configured inside Deltix system) |
InstrumentType | VARCHAR | FX | Instrument type |
Currency | CHAR(10) | USD | Order currency. Optional. Usually used only for orders that use term currency (rather than base currency). |
Exchange | CHAR(10) | HOTSPOT | Destination exchange (if available) for outbound messages and source exchange for inbound messages. For example, fills will report their exchange in this field. |
Account | VARCHAR | Gold | Identifies order account |
Trader | VARCHAR | jdoe | Identifies trader who submitted this order |
Side | VARCHAR | BUY | Order side |
TimeInForce | VARCHAR | GOOD_TILL_CANCEL | Order time in force condition |
ExpireTime | TIMESTAMP | 2019-02-27 17:00:00.000 | Order expiration time (only for GOOD_TILL_DATE orders) |
Quantity | FLOAT8 | 100.50 | Order quantity |
MinQuantity | FLOAT8 | 10 | Minimum quantity to execute (optional order request attribute) |
DisplayQuantity | FLOAT8 | 5 | Minimum quantity to display on exchange floor (optional order request attribute) |
OrderType | VARCHAR | LIMIT | Order Type |
LimitPrice | FLOAT8 | 1.33 | Limit price. Can be specified for LIMIT, STOP_LIMIT, PEGGED, or CUSTOM order types |
StopPrice | FLOAT8 | 1.20 | Stop price. Can be specified for STOP and STOP_LIMIT order types. |
PegDifference | FLOAT8 | 0.03 | Peg offset, in order money. Optional attribute for PEGGED order types. |
AveragePrice | FLOAT8 | 1.325 | Average execution price (order events only). |
CumulativeQuantity | FLOAT8 | 25 | Cumulative executed quantity (order events only). |
RemainingQuantity | FLOAT8 | 75.50 | Remaining order quantity (part of original order quantity that is still working on the market) |
TradePrice | FLOAT8 | 1.321 | Trade events only: price of individual trade described by this event. Not to be confused with average price of all trade events reported so far for an order (AveragePrice field). |
TradeQuantity | FLOAT8 | 5 | Trade events only: size of individual trade described by this event. Not to be confused with total executed size reported so far by all trade events of an order (CumulativeQuantity field). |
Commission | FLOAT8 | 0.0001 | Trade commission (when known) |
CommissionCurrency | CHAR(10) | USD | Trade commission currency (when known, by default assume order currency) |
CounterPartySourceId | CHAR(10) | JOHN | Identifies source of other side of the trade (when reported) |
CounterPartyOrderId | VARCHAR | FED76123155 | Identifies other side of the trade (when reported) |
SettlementDate | TIMESTAMP | Trade settlement date (when reported) | |
TradeDate | TIMESTAMP | Trade date (when reported) | |
Reason | VARCHAR(max) | Market is closed | Reason communicated for cancel or reject events. |
VendorRejectCode | INT4 | 1003 | Vendor specific reject code. For example CME’s. |
DeltixRejectCode | INT4 | 120 | Reject code in Deltix classification |
MultiLegReportingType | VARCHAR | Used for trade reports when order instrument is exchange traded-synthetic. Identifies single-leg trade or whole contract trade of multi-legged security. | |
AggressorSide | VARCHAR | Reports our side as passive or aggressive role in this trade. | |
OrderUnknown | BOOLEAN | Flag used by order Cancel Reject events. | |
CancelType | VARCHAR | Enum used by Cancel events. | |
ExecRestatementReason | VARCHAR | Used by Order Restate Events to classify restate type | |
Flags | INT4 | 3 | Order flags. Bitmask containing various order flags. For example, bit 0 marks manual order. |
UserData | Foo152 | User-provided order tag | |
Attributes | VARCHAR(max) | [{"key":6001,"value":"4h"},{"key":6002,"value":"FAST"}] | Custom order attribute key (numeric, corresponds to custom FIX tags specified during order submission) |
Changelog
Ember Version 1.4
- Added the TERM (Int64) column to Orders and Messages tables.
- Added fields OPENSEQUENCE (Int64), DeltixRejectCode (Int32), VendorRejectCode (Int32) columns into Orders table.
- Renamed field Text to Reason in Orders table.
- Rename field Sequence to CloseSequence in Orders table.
Ember Version 1.1
- Table Orders had datatype of CURRENCY field changed from VARCHAR to CHAR(10).
- Table Messages had field CounterpartyId split into CounterPartySourceId and CounterPartyOrderId fields.