Next gen of alt data
By Dr Elliot Banks (pictured), Chief Product Officer, BMLL Technologies
While the relationship between systematic hedge funds and alternative data sources is not new, quants have recently raised concerns that alt data is not giving the same alpha as it used to. This is because people are increasingly accessing the same data sets (web scraping, search analytics, satellite feeds etc.), making it difficult to find investing signals and high-frequency insight amongst the noise.
New data sets are needed to satisfy the demand for data-driven insights to spot long-term trends, improve trading decisions and ultimately drive performance. And while the industry understands the benefits of real-time (or near real-time) data, market participants are waking up to the predictive power of pricing data that comes from having access to vast amounts of historic data. Having a minimum of 5 years’ worth of historic Level 3 order book data is what is needed to have a meaningfully predictive data set. Or in other words...the next generation of alternative data.
The next gen alt data: Five years of historic Level 3 order book data
Accessing and analysing this type of data to derive statistically relevant signals has up to now been the preserve of large systematic funds with in-house capabilities to generate, analyse and ingest these levels of data in a scalable environment. But with the advent of public cloud and increased processing power, is this all about to change?
Above and beyond
Historic Level 3 order book data includes every single order sent to an exchange, encompassing each amend, cancel and fill. This granularity of data grants users the transparency to calculate important analytics such as fill probability, average resting time of an order, and how many price levels an order will move through before being completed. Level 3 data combined with an analytics capability basically allows you to derive a probability that an order is actually a large, real-money order that has to be to be filled, rather than volumes composed of numerous small orders that might not represent true liquidity. The primary use for this type of data is to find alternative sources of alpha generation.
Having a whole 5 years’ worth of Level 3 data, moreover, provides new opportunities to analyse long-term cycles and trends, back test your strategies, and look at the decision-making processes that every market participant made to actually trade. This is absolutely critical if you want to find inferences and go beyond the top of the order book.
The primary use of historic Level 3 data is to find alternative sources of alpha generation
If, for instance, you are going to programme your algo to send an order in IBM that trades on multiple different venues, do you want to be sending it to the venue that has the greatest amount of failed trades over the last 5 years, or the one that has the largest market impact when your order is sent to it? The only way you can get that intelligence for your algo is from historic Level 3 data. These data sets enable participants to understand how markets behave and unlock the full potential of the predictive power of pricing data over a sufficiently long time horizon to capture a wide spectrum of market scenarios.
Insights delivered at scale
BMLL is the only firm that offers five years’ worth of granular, historic order book data in a completely harmonised, information-rich format and, crucially, a cost-effective manner. The data is easily delivered alongside a suite of analytics and via the cloud resource needed to calculate these insights, all accessible in daily worlows. This is made available through a public cloud, which is the only real viable method to ingest such huge quantities of historic data, as well as having the scalable compute power to map that data across venues.
BMLL takes 5 years’ worth of raw Level 3 data from venues across the US and harmonises this data as opposed to normalising it. This approach ensures that none of attributes of the data are lost - nothing has been removed, meaning the intrinsic value that comes with the data remains. Instead, BMLL makes this data and a comprehensive library of analytics available to hedge funds and quants directly into their trading systems, removing the operational complexities of data engineering - sourcing, cleansing and storing - while supercharging their research needs.
BMLL’s 5 years’ of data is available to market participants in two different formats. Python-native quants can access BMLL’s Data Lab, a data science platform that allows users to access Level 3 harmonised order book data, process it at scale, and find inferences by drilling down into every single message, nanosecond by nanosecond. Alternatively, funds can take a feed product via an API or FTP delivery that can easily consume metrics, such as average resting time of an order or fill probability at a particular level of the order book, straight into their system.
Sating the demand for alpha
With access to 5 years’ worth of Level 3 order book data, alpha-hungry hedge funds can now test their strategies over long enough periods to be able to spot real trends, without the need for complex in-house data cleansing and engineering capabilities. They can sift through the noise and analyse the full depth of the order book, which allows them to understand decisions made by the traders throughout the order-making process. If you are an algorithmic trader, therefore, you can use the access to 5 years’ worth of Level 3 data to ultimately help drive your algorithms and feed into your smart order routers. If you are a systematic hedge fund looking for alpha, you can turn Level 3 data into data points that help your decision-making process go beyond patterns from the top of the order book.
Either way, the next generation of alt data has arrived.