Models that aim to predict daily stock returns perform better when they combine text data and financial quantitative data, according to Capital Markets Cooperative Research Centre (CMCRC).
A study by Zhendong (Tony) Zhao, Nataliya Sokolovska and Professor Mark Johnson looks at combining quantitative and text data rather than treating them separately reducing errors by almost three per cent compared to results when only quantitative data is examined.
Johnson says: “An almost three per cent improvement doesn’t seem large but it has a significant impact in finance because the new method’s prediction is closer to the real value. This is of particular interest to traders, brokers and investors who want to make a decision on whether to hold, sell or buy after a certain event. For example, Company AAA issues an earnings statement, ASX issues an announcement on the earnings then the algorithm looks for key words in the announcement and compares these to current quantitative data and predicts the return at the end of next day’s trading.”
Zhao says: “By analysing the announcement and financial quantitative data the combination of these two different types of data gives the research far more variables to analyse, which seems to have led to more accurate predictions.”
To examine the performance of these combinations the research uses 19,282 ASX announcements from the first half of 2010. The research uses 80 per cent of the announcements to train the algorithm and 20 per cent to test the different combinations.
The study compares the predictive performance of four different combinations of features including text data and quantitative data with various weighting schemes on the quantitative and text variables using advanced statistical techniques. The best performance was gained by applying different weightings between quantitative and qualitative text data, as this prevented the prediction model from over reacting to minor or random fluctuations in the data
Very little academic work has looked at combining quantitative and text data to predict daily stock returns. This research uses state of the art data mining techniques of key word weightings within the ASX announcements. Future research will include other advanced data mining techniques developed by CMCRC to analyse text data. Text data are things like company announcements, media news, and social media while financial quantitative data include factors like past daily returns, capital size, volatility, and the stock price.
Zhendong will back test his research results on CMCRC’s Alluvial Backtesting Platform, which was launched to its PhD students in early 2013. If this back testing proves to be successful his research will play a role in putting new science behind the methods of predicting daily stock returns, particularly as text based information is growing in quantity and importance.