Leveraging Machine Learning to Create a Better Investment Process

Updated: Apr 21

Article by

Christian Gloor & Steven Van Winkel

This article charts our progress from today's approach based on strict modeling guidelines to sophisticated, but fully transparent machine-learning-based models. Without going into technical details, we show how we designed an investment process that we expect to deliver attractive returns that are uncorrelated to traditional markets.​

At Tom Capital, we forecast financial markets using publicly available data. We process a comprehensive set of relevant data covering economic, sentiment and technical developments to anticipate the investment decisions of other market participants. Our current models rely on principles and techniques such as scoring, linear regression as well as on the judgment from our specialists. We think that it is important to include our economic knowledge - however not in a discretionary way of making trading decisions. Once our models are rigorously statistically validated, they work fully systematically in our model portfolio. We encode our view of how the world interprets data so that later the machines can generate a forecast free from emotions.​We see ourselves at the forefront of forecasting technology. Our approach integrates proven investment knowledge and advanced data science into a unique and in our view cutting-edge investment process. Since we have a forecasting and trading infrastructure that is fully automated and only needs our attention for daily data checks and order signoff, we have more time to focus on model building and validation.

Over the last years, we also systematized this model building process more and more in order to get full transparency of what drives our decisions.  ​Our prediction models are built on selected and transformed indicators. All indicators need to undergo tests that check for historic evidence. A potential candidate is measured on its performance on historical market data. Out of multiple promising candidates, only the best are selected and added to the forecast model. This scientific process poses the biggest two risks in model creation: overfitting and forward-looking bias.​ It is rather easy to fit several indicators to past market returns. In fact, even when researchers try hard to avoid it, data is still fitted. There are not enough occurrences of independent events in the past that a statistically significant forecast is possible. In addition, even strict statistical tests fail to filter false positives once too many candidates are tested. After a sufficient number of trials, it is guaranteed that a researcher will always find a misleadingly profitable strategy [1].

​During the time-consuming process of building a model, the researcher builds a mental understanding of the workings of the indicators. This is legitimate learning but often results in induced belief, also known as storytelling. Once the connection is found, the meaning becomes apparent, whether true or not. It is hard to let go and free the mind once a 'discovery' has settled in. Even if the found connection is real, it is not possible to look at the data at a past moment without bias. We could create a model using only data that was available in 2007. But we always know what happened during the financial crisis in 2008 would use such indicator combinations that our models survive the crisis. We cannot program our minds to block out what we already know will happen. ​At Tom Capital, we are looking to eradicate these shortcomings of manual forecasting models by applying modern machine learning techniques. As mentioned, our manually built models are rigorously tested and validated. Each indicator considered must pass certain tests before it can be included in the model. Over time we added so many such checks to our process that our own influence on the actual model building became minimal.

Eventually, it became clear that the automation of the model building process would be almost the only possibility to uphold our strong view on model quality. ​For example, the combination and weighting of indicators can be done by a constrained linear regression. This mathematical process produces a result that is similar to our judgment-based approach. However, we are now in the position that we can analyze the result without any knowledge bias. We can cross-validate the results of the model to statistically prove that it was able to forecast the market. This is done by, say, letting the machine develop the model on data of 9 years out of 10. This model is then tested on the remaining one year worth of data, which has been unknown to the model building process so far. This cannot be achieved by us humans, as we all remember the past. The machine is able to forget and ignore it. To be absolutely certain, this process is repeated using another 9 years out of these 10, and validated on the new remaining year, until all 10 combinations are exhausted and tested. This gives us a measure not only of model performance but also of its stability. Only when these criteria are satisfied, the model enters the backtesting phase. 

Backtests are no longer used for fitting data, they become a tool of model validation.​Typically a model is backtested over the last 20 years. Its decisions are traded virtually to get a simulated track record. While this test is necessary, it is clearly biased by the fact that the model has been developed with all the market knowledge available up until today. It will greatly overestimate the model quality.

Our walk-forward backtesting approach is a continuation of the last step described above. We force the machine to forget. This means we let it build the model using all data that was available until, say, the year 2002. This model is then backtested on the markets of 2003 using only data that has not been used to build the model. Again, this step is repeated until we reach the current year. We not only get a realistic estimate of the model's performance in the past, we even gain great insights on its performance and stability over time. 


[1] Bailey, David H. and López de Prado, Marcos, The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting and Non-Normality (July 31, 2014). Journal of Portfolio Management, 40 (5), pp. 94-107. 2014 (40th Anniversary Special Issue). Available at SSRN: https://ssrn.com/abstract=2460551 or http://dx.doi.org/10.2139/ssrn.2460551


[2] Dummy data. Visualization does not represent an actual model.

Article by

Christian Gloor & Steven Van Winkel

September  16  2019

#MachineLearning #TomcapitalAG

71 views0 comments