How to detect unwanted curve fitting during backtest

Whenever you develop an algorithmic trading strategy, unwanted curve fitting is one of the most dangerous hazards. It will lead to substantial losses in real time trading. This article will show you some ways to detect if the performance of your algorithmic trading strategy is based on curve fitting.

Curve fitting – what is it?

Every algorithmic trading strategy will have some parameters. There is no way around it. You will have to decide what length your indicators have, you will have to specify a specific amount for your stop loss or profit target. Beside the actual rules of your strategy the chosen parameters will usually significantly influence the back-test performance of your strategy. And with any parameter you add the danger of curve fitting rises significantly.

Novice algorithmic traders like to use an optimiser algorithm to find the best set of parameters. And this opens up Pandora’s box.  Have a look at this article to see a “great” example for the effects of curve fitting.

Avoid high dependency on parameters

The first thing you will have to check when making sure you do not fall into the curve fitting trap is the robustness of the used parameters. Try to change the parameters of your strategy a little bit and observe the impact on the back-test performance. If a strategy gives nice results with a 14-bar RSI, it should also show very similar results with a 7 or 21 bar RSI. If not, the performance of your strategy is most probably based on curve fitting and the real time results will not keep up with the results of the back-test.

one parameter stability test

one parameter stability test

The strategy on the left side shows a high dependency of the profit versus the chosen parameter. This is no good. Better think about better rules for your strategy until the dependency looks similar to the right chart. Although the profit is not as high as with the strategy on the left, chances are better that it will perform in real-time trading.

If your strategy uses more than one parameter, then run a test over 2 parameters at a time and select a stable area.

parameter stability curve fitting test

parameter stability curve fitting test

On the chart above I ran a test on the dependency of the net profit versus two parameters. The profit is colour coded, green represents a high profit, red represents a loss.

Do not select the highest profit (as shown on the chart). This is just a lucky, singular solution and the chances are high that the strategy will not keep up its performance in the future. Better select a parameter set in the middle of a green area. It might not be the best solution, but the chances that the strategy will perform in real-time is way higher.

Walk forward parameter optimization

You can run the stability test shown above on different snippets of your back-test data. Try to find the stable areas for your strategy for e.g. every year in your back-test data. If the areas change significantly from year to year, your strategy is most probably based on curve fitting. If you see similar “good” parameter areas for each year, a stable performance can be expected in the out of sample trading.

Adding noise to test data

Parameter dependency is one way to uncover unintentionally curve fitting. Another way to uncover curve fitting is to add some noise to your data. By transforming the actual market data just a little bit, you surely will find the weak points of your strategy quickly. Have a look at the “noisy data” article to see how it is done. Beside adding some noise to the data you can also de-trend the data or add some extra volatility. If the strategy only shows the desired performance on the original data, but not on the noisy data, there surely is something wrong with your strategy.

Can you spot the differences between the noisy data and the original data? I hardly can. So a “good” strategy should perform the same on both data sets. But have a look at the chart below.

noisy data curve fitting test

noisy data curve fitting test

The performance of the strategy seems to break down on the noisy data. This is another hint that the strategy’s performance is only based on curve fitting and that it will not perform out of sample. Try to add a better logic to your strategy to avoid this effect or you will lose your money in real time trading.

Similar markets stability test

So you managed to design a strategy which does not show a high dependency on the used parameters and also passes the noisy data test. But unfortunately this still does not make sure that your strategy will perform out of sample. My favourite test to detect curve fitting is to apply the strategy on different, but somehow similar data.

As an example, if you would design a strategy which is intended to work on SPY, why shouldn’t it also work on other equity indices or single stock data.

similar markets curve fitting test

similar markets curve fitting test

The strategy shown above worked nicely on the Dow Jones Index, but only works on 7 out of 30 Dow Jones stocks. This would not be enough for me. A nice and stable strategy should at least work on 70 to 80% of the stocks. I would not expect a perfect equity line on each individual stock, but at least a black zero on most of them.

Unless your strategy finds its edge in some very specific property of the market, it should always work on similar markets without the need to adjust the parameters. What is good for EURUSD should also be fine on EURJPY…

Delay entry and exit orders

Another way to detect curve fitting is to delay the entry and exit signal by one or more bars. If this breaks your strategy it is a strong hint that its performance might only be based on curve fitting. Your dependency on exact timing might be too high. As you have seen in the article on entry signal efficiency, a good entry signal should show its edge for some time. Therefore the timing of the entry should not be too important.

Shift stop/limit orders

If your strategy is using stop orders, you could add some noise to the levels you are using. So instead of an order like “buy next bar at high stop/limit”, you could add a small percentage or volatility based noise to these orders. e.g. when trading the SPY a strategy buying or selling at the previous high should yield nearly the same results when you buy/sell one point above or below the previous high. This test is similar to the noisy data test and works best with pattern recognition strategies.

Shifting the orders by a small amount will also help to replace slippage, as old high/low values are usually used by many traders to get into/out of the position.

Conclusion

Every algo trading strategy will show some dependency on the chosen parameters. You can only minimise this effect by choosing  sound rules. Try to avoid as many parameters as possible, and if there is no way around it, try to make them self adjusting to absolute market levels, volatility and momentum. This should make sure that your strategy performs over different markets and with changing market behaviour.

Last step: define cut off levels

If your strategy has passed all of the tests mentioned above, it is still no proof that it is a good strategy. The market behaviour might change significantly or something else might happen. So the last and most important thing in algorithmic trading is to define a scenario when you will switch off your strategy. This is best done by using the maximum historic drawdown of your back-test. As soon as real-time trading results show a way higher drawdown (+25% or more)  pull the plug and stop trading. There is no need to hope for a recovery, it is always better to limit losses than to base your trading on hope.

Stay cautious and good luck!

all screenshots taken from tradesignal software