Skip to main content

Streaming Orders into AWS RedShift

Overview

This document explains how to configure and use AWS RedShift, Amazon’s fully managed petabyte scale data warehouse, for the Execution Server (ES).

Deltix recommends this warehouse to clients who have a relatively small order rate (few thousands per second) and want the simplicity of an AWS managed database with the analytical power of RedShift.

Please be warned that RedShift is not very good at storing elevated rates of trading activity. See the Performance section for more information.

RedShift Setup

To launch a RedShift cluster, follow the description in the AWS RedShift tutorials.

Practical Considerations

Disk Space

Storage for one order takes approximately 700 - 1000 bytes. Multiple cancellation attempts or an unusually high amount of trades per order can increase the order footprint.

Please make sure you select the appropriate storage sizes. We recommend starting with 1TB.

VPC Network

For improved performance, consider running your RedShift cluster in the same VPC as Ember.

If you plan to use the RedShift Query Editor, you need to use an Advanced VPC configuration.

Make sure AWS VPC settings would allow the Ember data warehouse to connect to your RedShift cluster.

Create a Cluster

To create a cluster in the AWS RedShift console, follow these steps:

  1. Use Create Cluster.

  2. On the Cluster Details tab, fill out the fields.

    RedShift wizard 1

  3. On the Node Configuration tab, select a node type.

    RedShift wizard 2

  4. On the Additional Configuration tab, select Enhanced VPC and the correct VPC.
    You can edit the VPC configuration to enable access to the RedShift port later on.

note

If you set up a test database and test it over the Internet, in the network settings, enable public access.

  1. Once the database cluster starts up, on the RedShift Cluster Configuration tab, look up "JDBC URL" and copy the cluster hostname.
    You will need to enter it into Ember’s data warehouse streaming configuration.

    RedShift wizard 3

Configure the Execution Server

The Execution Server can export data into RedShift in two modes:

  • Live Mode: A special daemon service exports data in near real-time.
  • Batch Mode: A periodic process exports all recently accumulated data in batches.

In the configuration fragment below, the highlighted portions reflect the RedShift cluster created in the previous section. Here we also show how to control read and write batch sizes to improve tool responsiveness.

If you want to limit streaming to either the Orders block or the Messages block, only include the one you want to use.

To configure the Execution Server, add the following sections to $EMBER_HOME/ember.conf.

warehouse {
redshift { # unit id, you will use it when you run the app, it might be any
live = true # keep checking for new messages when end of journal is reached
readBatchLimit = 10 # reduce batch size - redshift is slow

messages = [
${template.warehouse.redshift.messages} { # loader which loads order messages
loader.settings {
host = "ember-redshift-cluster-1.cau6u2ckbpk1.us-east-2.redshift.amazonaws.com"
port = 5439
username = "awsuser"
password = "data#WAREZ#1"

databaseName = "ember"
tableName = "messages"

createDatabase = true # create database if not exists
createTable = true # create table if not exists
dropTable = false # drop table if exists
batchLimit = 10
}
}
]

orders = [
${template.warehouse.redshift.orders} { # loader which loads closed orders
loader.settings {
host = "ember-redshift-cluster-1.cau6u2ckbpk1.us-east-2.redshift.amazonaws.com"
port = 5439

username = "awsuser"
password = "data#WAREZ#1"

databaseName = "ember"
tableName = "orders"

createDatabase = true
createTable = true
dropTable = false
batchLimit = 500
}
}
]
}
}

danger

For production setups, store connection passwords in a Hashicorp Vault or hash them using the Mangle tool. For more information on storing passwords, refer to the Ember Configuration Guide.

Start Exporting Data

To begin streaming Execution Server data into RedShift, use the data-warehouse service. This service reads the Ember journal and converts all trading messages and completed orders into RedShift Messages and Orders tables.

note

Active orders are not exported until they are complete (completely filled, cancelled, or rejected). However, messages concerning active orders are exported immediately.

Run Ember's data-warehouse script with single argument that specifies the RedShift data warehouse:

export EMBER_HOME=/deltix/emberhome
/deltix/ember/bin/data-warehouse redshift

In batch mode, this script exits as soon as all recent data exports. If you re-run the script, it appends any new data that the Execution Server accumulated since the last export.

Verify Installation

To verify your setup, use the AWS RedShift Query Editor:

RedShift wizard 4

You should see some content in the Messages table.

Message Identity

Records in Orders and Messages can be identified using the composite key {Term, Sequence}.

  • Sequence comes from the upstream Ember Order Management System (OMS).
    • Each message processed by the OMS is assigned a unique sequence number.
    • Not all messages are stored into the data warehouse. For example, some internal system control messages are skipped. Hence, you may see gaps in sequence numbers which otherwise increase monotonously.
    • System operators may periodically clear Ember’s journal. This action restarts message sequence (and increases Term). When this happens, new messages start at a different term.
  • Term identifies journal creation time.
    • Term remains the same for all messages written since Ember journal creation.
    • Each time the journal is cleared and re-created, the value specified as Term increases.

Example

TermSequenceData
15463008001OrderNewRequest
15463008003OrderNewEvent
.........
1546300800413891The last message before journal is reset
Operator resets Ember’s journal (will result in new term and message sequence reset)
15619392001OrderCancelRequest
15619392002OrderCancelRequest
.........
note

You can rely on the fact that {Term, Sequence} are always increasing to implement the before-after ordering of messages.

Performance

Amazon RedShift is designed for analytics queries, rather than transaction processing.

The cost of COMMIT is relatively high. Deltix benchmarking shows that RedShift data loading is limited to about 500 messages per second. This was observed with default settings of a two-node dc2.large cluster.

On the data loading side, Deltix uses configurable batched inserts. Aggressive batching using the batchLimit parameter helps.

The following graph shows data from the experiment where we loaded trade report events into RedShift.

RedShift performance chart

The Deltix data warehouse can handle trading activity spikes, but if your sustained load exceeds 10 orders per second, you may want to consider using another data warehouse. For example, ClickHouse can handle more than 100,000 orders per second.

Appendix: Data Format

This section describes the format of the two tables used to warehouse Ember's trading history.

Orders Table

The Orders table captures the final state of each order.

ColumnTypeExampleDescription
TermINT81546300800Identifies sequence term. See “Message Identity” section above. Since: Ember 1.4
OpenSequenceINT8312321304Identifies sequence number of the message that created the order (usually OrderNewRequest).  See “Message Identity” section above.  Since: Ember 1.4
CloseSequenceINT8312321312Identifies sequence number of the message that completed (closed) the order (usually OrderCancelEvent, OrderRejectEvent, OrderTradeReportEvent, etc.).  See “Message Identity” section above.  Since: Ember 1.4
SourceIdCHAR(10)CLIENT52Order source, ALPHANUMERIC(10)
DestinationIdCHAR(10)TWAPOrder destination, ALPHANUMERIC(10)
OrderIdVARCHARICAP1983EEIdentifies each order for Execution Server, unique per-source. OrderID is assigned by order source.
ParentSourceIdCHAR(10)CONTROLIdentifies source of parent order (optional)
ParentOrderIdVARCHARICAP321XX1Identifies parent order (optional)
ExternalOrderIdVARCHARZZ132131Optional order identifier assigned to the order by execution venue. For example, if we send order to execution venue, like CME they assign their own order identifier. This identifier can be subsequently used to locate this order on CME.
AccountVARCHARGoldIdentifies order account
TraderVARCHARjdoeIdentifies trader who submitted this order
SymbolVARCHAREUR/USDOrder symbol (in symbology configured inside Deltix system)
InstrumentTypeVARCHARFXInstrument type
ExchangeCHAR(10)HOTSPOTDestination exchange (if available)
CurrencyCHAR(10)USDOrder currency. Optional. Usually used only for orders that use term currency (rather than base currency).
SideVARCHARBUYOrder side
TimeInForceVARCHARDAYOrder time In force
ExpireTimeTIMESTAMP2019-02-27 17:00:00.000Order expiration time (only for GOOD_TILL_DATE orders)
OrderStatusVARCHARCANCELLEDFinal state of the order
OpenTimeTIMESTAMP2019-02-27 16:51:48.002Order submission time
CloseTimeTIMESTAMP2019-02-27 16:51:48.120Order completion time
OrderTypeVARCHARPEG_TO_MIDPOINTOrder type
LimitPriceFLOAT81.33Limit price. Can be specified for LIMIT, STOP_LIMIT, PEGGED, or CUSTOM order types
StopPriceFLOAT81.20Stop price. Can be specified for STOP and STOP_LIMIT order types.
QuantityFLOAT810000Order quantity
DisplayQuantityFLOAT81000Order display quantity (some time described as “max show quantity” or “max floor quantity”), where applicable.
MinQuantityFLOAT81000Minimum fill quantity (where applicable).
CumulativeQuantityFLOAT81500.50Cumulative filled quantity
AveragePriceFLOAT8132.56Average fill price
VendorRejectCodeINT41003Vendor specific reject code. For example CME’s. Since Ember 1.4.
DeltixRejectCodeINT4120Reject code in Deltix classification. Since Ember 1.4.
ReasonVARCHAR(max)“Cancelled by user request”For cancelled or rejected orders this field contains textual reason.

Messages Table

The Messages table records all order-related activity in real time.

More specifically this table records order requests (original submission, cancellation, and order modification requests) and order events (for example, order acknowledgement, cancellation confirmation, or traders).

To get a better understanding of trading workflows in the Execution Server, refer to the Trading Data Model document.

ColumnTypeExampleDescription
TypeVARCHAROrderTradeReportEventIdentities type of message. See Trading Data Model for list of event types.
TermINT81546300800Identifies sequence term. See “Message Identity” section above. Since: Ember 1.4
SequenceINT8312321312Unique number that represents ES message sequence, can be used as unique synthetic timestamp. See “Message Identity” section above.
TimestampTIMESTAMP2019-02-27 16:51:48.123Message timestamp
SourceIdCHAR(10)CLIENT52Order source, ALPHANUMERIC(10)
DestinationIdCHAR(10)TWAPOrder destination, ALPHANUMERIC(10)
OrderIdVARCHARICAP1983EE23Identifies order for Execution Server, unique per-source.
OriginalOrderIdVARCHARICAP1983EE22For order replacement request, as well as events that relate to cancel replace workflow (such as PendingReplace, ReplaceReject, and Replace ACK) this field identifies original order in cancel-replace chain.
CorrelationOrderIdVARCHARICAP1983EE00Identity of the first order in cancel-replace chain. Same as OrderID for orders that do not (yet) participate in cancel-replace workflow.
ParentSourceIdCHAR(10)CONTROLIdentifies source of parent order (optional)
ParentOrderIdVARCHARICAP321XX1Identifies parent order (optional)
RequestIdVARCHARXCL#554For order cancel request, as well as cancel ACK and cancel NACK events this field identifies specific cancel request.
ExternalOrderIdVARCHARZZ132131Optional order identifier assigned to the order by execution venue
EventIdVARCHARAAAT31231Optional attribute available for events coming from some venues. Allow identifying duplicate events. May have different uniqueness scope, but must be unique at least in the context of single order. NOTE: OMS is responsible for filtering out duplicate events before they reach data warehouse or other downstream consumers.
ReferenceEventIdVARCHARUsed by trade correction and cancellation events to identify previously communicated event that has to be corrected or cancelled.
OrderStatusVARCHARPARTIALLY_FILLEDOrder status (available for order events only).
SymbolVARCHAREUR/USDOrder symbol (in symbology configured inside Deltix system)
InstrumentTypeVARCHARFXInstrument type
CurrencyCHAR(10)USDOrder currency. Optional. Usually used only for orders that use term currency (rather than base currency).
ExchangeCHAR(10)HOTSPOTDestination exchange (if available) for outbound messages and source exchange for inbound messages. For example, fills will report their exchange in this field.
AccountVARCHARGoldIdentifies order account
TraderVARCHARjdoeIdentifies trader who submitted this order
SideVARCHARBUYOrder side
TimeInForceVARCHARGOOD_TILL_CANCELOrder time in force condition
ExpireTimeTIMESTAMP2019-02-27 17:00:00.000Order expiration time (only for GOOD_TILL_DATE orders)
QuantityFLOAT8100.50Order quantity
MinQuantityFLOAT810Minimum quantity to execute (optional order request attribute)
DisplayQuantityFLOAT85Minimum quantity to display on exchange floor (optional order request attribute)
OrderTypeVARCHARLIMITOrder Type
LimitPriceFLOAT81.33Limit price. Can be specified for LIMIT, STOP_LIMIT, PEGGED, or CUSTOM order types
StopPriceFLOAT81.20Stop price. Can be specified for STOP and STOP_LIMIT order types.
PegDifferenceFLOAT80.03Peg offset, in order money. Optional attribute for PEGGED order types.
AveragePriceFLOAT81.325Average execution price (order events only).
CumulativeQuantityFLOAT825Cumulative executed quantity  (order events only).
RemainingQuantityFLOAT875.50Remaining order quantity (part of original order quantity that is still working on the market)
TradePriceFLOAT81.321Trade events only: price of individual trade described by this event. Not to be confused with average price of all trade events reported so far for an order (AveragePrice field).
TradeQuantityFLOAT85Trade events only: size of individual trade described by this event. Not to be confused with total executed size reported so far by all trade events of an order (CumulativeQuantity field).
CommissionFLOAT80.0001Trade commission (when known)
CommissionCurrencyCHAR(10)USDTrade commission currency (when known, by default assume order currency)
CounterPartySourceIdCHAR(10)JOHNIdentifies source of other side of the trade (when reported)
CounterPartyOrderIdVARCHARFED76123155Identifies other side of the trade (when reported)
SettlementDateTIMESTAMPTrade settlement date (when reported)
TradeDateTIMESTAMPTrade date (when reported)
ReasonVARCHAR(max)Market is closedReason communicated for cancel or reject events.
VendorRejectCodeINT41003Vendor specific reject code. For example CME’s.
DeltixRejectCodeINT4120Reject code in Deltix classification
MultiLegReportingTypeVARCHARUsed for trade reports when order instrument is exchange traded-synthetic. Identifies single-leg trade or whole contract trade of multi-legged security.
AggressorSideVARCHARReports our side as passive or aggressive role in this trade.
OrderUnknownBOOLEANFlag used by order Cancel Reject events.
CancelTypeVARCHAREnum used by Cancel events.
ExecRestatementReasonVARCHARUsed by Order Restate Events to classify restate type
FlagsINT43Order flags. Bitmask containing various order flags. For example, bit 0 marks manual order.
UserDataFoo152User-provided order tag
AttributesVARCHAR(max)[{"key":6001,"value":"4h"},{"key":6002,"value":"FAST"}]Custom order attribute key (numeric, corresponds to custom FIX tags specified during order submission)

Changelog

Ember Version 1.4

  • Added the TERM (Int64) column to Orders and Messages tables.
  • Added fields OPENSEQUENCE (Int64), DeltixRejectCode (Int32), VendorRejectCode (Int32) columns into Orders table.
  • Renamed field Text to Reason in Orders table.
  • Rename field Sequence to CloseSequence in Orders table.

Ember Version 1.1

  • Table Orders had datatype of CURRENCY field changed from VARCHAR to CHAR(10).
  • Table Messages had field CounterpartyId split into CounterPartySourceId and CounterPartyOrderId fields.