Testing the accuracy of your models predictions

Updated: Feb 18

Written by Sabrina Herold, Christian Gloor, and Steven Van Winkel at Tom Capital AG

In forecasting, a lot depends on the accuracy of the prediction you make. There are multiple ways to test the accuracy of your model. A common way to measure how well a strategy would have performed in the past is called a 'back-test'.

Most so-called 'back-tests' are engineered by people who have become very good at what they do over a long time. Unfortunately, the better someone becomes at it, the more likely errors such as selection bias and overfitting will come to show in what is called curve fitting. Essentially, in back-tests, experts fit the model to the particular historical curve they are looking at, including random market noise. As soon as the same model is exposed to new market data, it can fail drastically, in part because it failed to correctly identify a pattern. See the graph below for an illustration of this behavior (1).

While human experts try to understand market mechanisms based on knowledge and experience, machines understand markets only in terms of data and patterns. Based on the input data, the algorithms find patterns that have occurred in the past and assume that they will repeat in the future. In this way, machine learning can identify connections that are unknown to the human expert. At the same time, this data-driven approach offers a significant advantage in verifying the accuracy of the algorithm's predictions. Unlike humans, who cannot voluntarily erase a certain part of their memory, we can keep machines uninformed about a certain part of history.

The Walk Forward works in the context of machine learning, exactly because algorithms can only process data you give them. In the simplest abstraction of this method, you give an algorithm all the available data until 2018 and have it predict the market movement for 2019. You can then observe how accurate or robust your algorithm is in its prediction compared to the actual market movement in 2019. Accordingly, Marco Lopez de Prada describes The Walk Forward as “a historical simulation of how the strategy would have performed in (the) past” (2).

A close-up on the methodology

The Walk Forward is performed through a combination of in-sample optimization and out-of-sample verification tests. For this thought experiment, consider a period from 2018 to 2021 in which The Walk Forward runs in annual intervals. Also, imagine that the Machine Learning algorithm is introduced to a given number of input parameters with the objective to maximize performance. Input parameters are factors selected by the algorithm and are assumed to affect performance, e.g. time series of GDP, interest rates, and daily prices of the S&P 500.

The first step is to provide the model with all available data from 2018 and train it to find the input parameter settings that maximize the fit of the algorithm to this initial data set. This is called the in-sample test. Once the model accurately predicts 2018 with the optimized input parameters, it is given the task of predicting 2019 with the exact same input parameters for which it was optimized for in 2018, thus moving out-of-sample. The model then refits the parameters using the data through 2019 and predicts 2020 without additional data input. This process continues such that a historical out-of-sample window is created for each of the time periods considered, which together form an out-of-sample curve as illustrated below.

As great of a tool as the walk forward is, it does not come without its own flaws, one of which is its selection bias: The more trials an algorithm runs on any data set, the more likely it is going to recognize irrelevant patterns, or as Marco Lopez de Prada puts it "After a sufficient number of trials, it is guaranteed that a researcher will always find a misleadingly profitable strategy, a false positive." (3, p. 4). However, in this context, we can measure and control the overfitting much better than with a common back-test. The reason is, that the machine learner controls data input and construction of the algorithm to assess its accuracy.

Overall, The Walk Forward is a very useful tool for predicting the accuracy of machine learning algorithms, in part because of the ability to keep it uninformed about a given time window. No tool is perfect, so The Walk Forward must also be appropriately guided by a human to avoid distortions.

Tom Capital AG is an investment boutique that applies machine learning to the entire investment process, from data selection to portfolio construction. Do not hesitate to contact us at info@tomcapital.ch for more information or inquiries.


(1) López de Prado, Marcos, 2020, Overfitting: Causes and Solutions (Seminar Slides)

(2) López de Prado, Marcos, 2018, Advances in Financial Machine Learning, John Wiley & Sons, Inc.

(3) López de Prado, Marcos, 2014, The deflated Sharpe ratio: Correcting for Selection bias, backtest overfitting and non-normality

147 views0 comments