Digital Assets Report


Like this article?

Sign up to our free newsletter

Finding and using unique datasets by hedge funds

Related Topics

By Gene Ekster – Some historians note that in the 50’s Sam Walton, the founder of Wal-Mart, would get into his silver 75hp two-seater airplane and take to the skies to count cars in parking lots with the aim of making better decisions about his firm’s real estate investments. Today investors, with the help of companies like RS Metrics and their satellite imagery product, can scour the parking lots across the globe from their computers and make trading decisions in the spirit of Sam Walton. 

Datasets such as the ones from RS Metrics, are known as alternative data in the financial circles and are becoming more commonplace in the research arsenals of institutional funds. Along with their potential, alternative data bring a whole host of new considerations, including data sourcing, evaluation of news datasets, compliance and a new set of trading strategies.  
Sourcing data

How does a fund acquire this sort of data? Alternative data can come from a handful of sources, including web harvesting, savvy vendors, and "exhaust" operational information from regular companies that aren't in the business of selling data, yet.  Interestingly, this latter category has recently undergone a shift in attitude. In the past management would regard the process of compiling the necessary data for sale as a drain on company resources and not worth the potential revenue. However, these days many companies are extremely interested in selling their data, often seeking out buyers and asking for inflated prices. This shift marks an increased awareness amongst potential data vendors of their data asset’s value. In fact, within the last six years, the rudimentary data economy has been transformed to a sophisticated marketplace populated with discerning vendors and intermediaries. Data brokers and independent alternative data research firms are shifting the playing field by slowly transforming data assets to an open access commodity. While true commoditisation of alternative data is still years or even decades away, companies, such as 1010Data, DISCERN, ITG, 7Park, and Quanton Data, are making it easier than ever for an institutional investor to obtain non-conventional datasets. 1010Data takes things a step further by providing a best-of-class tool and support to further mine the terabyte sized datasets for those accustomed to spreadsheets, without a need for special training.
While getting data has become easier, web harvesting (or “scraping” as it’s sometimes called) has become more legally complicated. This harvesting technique has increasingly raised the issue of compliance, but most of the doubt has come from regulatory ambiguity. With only a handful of legal cases (such as 3Taps v Craigslist) onto which we can draw legal boundaries, web harvesting compliance is something that will change over the next few years. For instance, the courts will soon decide if a phone’s unique ID constitutes personally identifiable information. Additionally, the Data Broker Accountability and Transparency Act (Introduced 02/12/2014) is being circulated in congress. However, this act is primarily aimed at marketers and it is not clear what the implications will be for financial analysts.  
A common challenge when evaluating a new dataset is how to estimate the overall impact on P&L without investing significant amounts of time, money and other resources needed to fully explore it. A good start is to decompose the overall value of a dataset into a few key elements (see below).  Experience shows that many of these elements are related to one another. For example, poorly structured datasets are less likely to have been fully utilized by other shops and therefore present a potentially greater opportunity.
Key elements of dataset
Scarcity: how saturated is the Street with this information?
Granularity: what level of detail does the data go into? Is it user level / store level and what time frequency is it on?
History: how much back data is there?
Structure: is the information structured or unstructured?
Coverage: how many stocks or geographies does the dataset touch?

Without a systematic approach to qualifying the datasets' value, the entire R&D process can be undermined; all datasets seem promising, yet none are prioritized, and the overall pace of development suffers. A practical solution is to develop a quantitative report card and compile these elements' scores into a visual matrix that is re-assessed as the evaluation process moves along.

As this field grows and develops, one of the biggest challenges will be to proactively and creatively form relationships with data sources that have not yet been discovered by brokers. Because much of the value of a dataset comes from its rarity, these relationships will have to be maintained with a strong level of confidentiality. Over time, this will become more difficult as it will become more obvious that value driving datasets only come from a handful of sources. Data sources ought to be treated as providers of a scarce good. One of the risks of doing business is spending months of R&D's time developing a raw dataset only to later have the rights withdrawn by a vendor. Of course, contract negotiations can eliminate some of this risk, but strong relationships with vendors can sometimes be equally or even more important. This means that many companies will need a strong, creative data acquisition department, in order to develop and maintain high quality relationships with vendors.
Which datasets are the most valuable? Many datasets are available that allow insights into a company’s revenue streams and item purchases. However, topline estimates are at times difficult to monetize due to the ever increasing sell-side revenue predictions accuracy. In general, returns are more sensitive to EPS surprises than revenue surprises.  Consequently, any datasets that can give an investor a view into a company’s margins are a treasured find. In addition, datasets that focus on companies from the emerging markets are worth investigating, since EM's revenue surprise sensitivities are generally larger than their western counterparts.  

Finally, there is a growing plethora of free or low-cost data available online, ranging from world class weather information on Amazon’s AWS services to well organized government reports available through vendors like While not truly free, since a substantial R&D cost is required to monetise the information, free on-line data is a real opportunity that has been ignored for far too long in the field due to a perceived lack of value.

Ultimately owning the rights to explore a unique, compliant dataset is just the beginning. A typical raw dataset lacks structure, so the choice of which metrics to extract is less than obvious. It’s akin to having a lease on a mineral rich plot with many forms of resources in the ground. Depending on the choice of mining equipment, oil, gas or gold could be extracted.  

How then, is alternative data typically monetised? Predominately today it’s via generating revenue estimates with the ultimate aim of outperforming sell side consensus. Other operating metrics like APRU, churn, sales volume, etc. can be derived depending on the dataset. The estimates are then used as inputs to the decision making process of fundamental, usually equity focused, PM's teams. A recent and growing trend, however, is for funds to take this flow a step further by completely or partially automating trading strategies fed by alternative data.  Several quant funds are overcoming the challenges of unstructured information, limited backstory and a smaller investable universe and are using alternative data insights as input signals of fully quantitative trading strategies. 

Concluding remarks

In the past decade, the use of alternative data has been transforming the investment process of active management institutional equity investors. As with any new advantage, once implemented by a portion of the market, the rest of the market catches up, adopts and stays competitive. Recent trends to automate trading within the alternative data paradigm are likely to further fuel demand for new datasets and innovating mining techniques. Is this a flash in the pan? Time will tell, but with a growing number of funds across the street investing heavily in sources and analyst talent, usage of unique data is poised to not only grow, but in the next few years become the de-facto standard of investment research. Funds that don’t already have an alternative data strategy will still be well served to create one now and reap the benefits in the near future.

Gene Ekster, CFA is a Director of Data Product Development at 1010Data where he and his team use raw alternative data to create research products for the investment industry with a guiding philosophy of data transparency by which quantitative analysis are delivered alongside with the underlying data and support from analysts to help users interpret it. Previously he has directed and served the R&D teams at SAC Capital (now Point72 Asset Management) and Majestic Research (acquired by ITG Investment Research) respectively. He has a degree in Artificial Intelligence and Cognitive Science from UC Berkeley and an MBA from Cornell University. Gene is passionate about championing the growing alternative data field; he has recently delivered a talk on the subject which among other things, covers the subject of this post.

The views and opinions expressed in this article are those of the author’s and do not necessarily reflect the views and opinions of any other entity.

To learn about alternative data in finance: Click here to see presentation slides and video highlights from a talk on alternative data given in NYC on 10/23/2014

Like this article? Sign up to our free newsletter

Most Popular

Further Reading