# Machine learning: kNN algorithm explained

I always thought that inspiration and experience are key factors in trading. But every time my chess computer beats me without any inspiration, just by brute force, I get my doubts. This article will be about a brute force approach in trading. The kNN algorithm.

Rule based trading – algorithmic trading, is just a name for a set of if..then rules which will define the machines trading decisions. e.g. if the market crosses below the 200 day line, then short 100 contracts. If the market rises by 2% then exit the position.  Easy stuff like this… (for the beginning)

This article will be a short introduction to machine learning. I will use a classic algorithm of machine learning to let my computer find a prediction for tomorrows market move. In the meantime I’ll have a glass of wine with some friends and let the machine do the job; At least that’s the idea, but can it be that simple in real life trading?

### Unsupervised machine learning – kNN algorithm

The kNN algorithm is one of the most simple machine learning algorithms. Learning, in this case, is only a nice sounding label, in reality kNN is more of a classification algorithm.

This is how it woks:

The scatter chart above is a visualisation of a two dimensional kNN data set. For this article I used a classical indicators of technical analysis to do the prediction: a long-term and a short-term RSI indicator. The dots on the two dimensional scatter chart represent the historic RSI values at a given point of time.

Now have a look at the fat circled point. This point represents today’s value. It means, that today’s RSI1 has a value of 63, and RSI2 got a value of 70.

Additionally to the position on the chart the dots have got colours. A green dot means that the market moved up on the following day, a red dot shows a falling market on the day after.

We already know what has happened in history, so it is easy to colour the historic dots. But we do not know the colour of today’s dot, as it is not known where tomorrow’s market will end.

Based on the chart above, will it be a red or green dot? Will tomorrow be up or down?  Should I go long or should I go short?

### kNN – k nearest neighbours

To do a prediction of tomorrow’s market move, the kNN algorithm uses the historic data shown on the scatter plot above and finds the k-nearest neighbours of today’s RSI values. As you can see, our current fat point is surrounded by red dots. This means, that every time the two RSI values have been in this area, the market fell on the day after. That’s why today’s data point is classified as red. Wish it would be that easy all the times…

Call it classification or prediction, the two dimensional kNN algorithm just has a look on what has happened in the past when the two indicators had a similar level. It then looks at the k nearest neighbours, sees their state and thus classifies today point.

### kNN as Tradesignal Equilla Code

In this article I would like to show you an implementation with the Tradesignal programming language Equilla.

To implement the algorithm in Tradesignal we first have to do the shown scatter plot. The algorithm stores the values in an array.

8/9 calculates the value of the fast and slow RSI indicators

12/13 looks what will happen on the day after (for the training data set)

16/17/18 stores everything in an array.

The next task to complete is to calculate the distances of today’s RSI point to all the historic points in the training data set.

23/27 calculates the euclidean distance of today’s point to all historic points, line 29 then creates a sorted list of all these distances to find the k nearest historic data points in the training data set.

Nearly done. The next step is just to find out what classification (colour) the nearest points have got and use this information to create a prediction for tomorrow. This is done in lines 33 to 35

Have a look at the scatter chart at the beginning. If this would be the data stored in our training data set, the prediction, using the 5 nearest neighbours, would be -5. All the 5 nearest neighbours of our current data point are red.

Now that we got a prediction for tomorrow, we need to make use of this prediction and trade it. The returns then will show if everything works as predicted.

Over here I just do a simple long/short interpretation of the prediction, but of course you could also use the quality of the prediction (+5 or +1?) in some sort of way. Position sizing…?

### kNN algorithm performance

The next chart shows 2000 bars of daily Brent data. It uses a 14 and 28 day RSI to predict the next day’s move in the Brent oil market. The training was on the first half of the data set, and the 5 nearest neighbours did the classification.

Underneath the chart the returns of this test are shown. (strategy equity). On the bottom of the char you see the two RSI indicators used for the generation of the prediction / buy-sell command.

### kNN algorithm – conclusion

The kNN algorithm offers a framework to test all kind of indicators easily to see if they have got any predictive value. Judging on the shown graph it seems to work. It seems to be possible to use these two RSI indicators to predict tomorrow’s Brent move.

But unfortunately this also could be just completely useless curve fitting. It is you who has to select the indicators and their periods and you will have to define if you like the outcome of a selected parameter set. To many degrees of freedom to be sure. The kNN algorithm is useful, but its application in finance has to be treated carefully. Otherwise bad surprises are guaranteed

Not everything can be done by brute force, inspiration and experience are key factors in finance…

# Using Autocorrelation for phase detection

Autocorrelation is the correlation of the market with a delayed copy of itself. Usually calculated for a one day time-shift, it is a valuable indicator of the trendiness of the market.

If today is up and tomorrow is also up this would constitute a positive autocorrelation. If tomorrows market move is always in the opposite of today’s direction, the autocorrelation would be negative.

### Autocorrelation and trendiness of markets

If autocorrelation is high it just means that yesterdays market direction is basically today’s market direction. And if the market has got the same direction every day we can call it a trend. The opposite would be true in a sideway market. Without an existing trend today’s direction will most probably not be tomorrows direction, thus we can speak about a sideway market.

### Autocorrelation in German Power

But best to have a look at a chart. It shows a backward adjusted daily time series of German Power.

The indicator shows the close to close autocorrelation coefficient, calculated over 250 days. You will notice that it is always fluctuating around the zero line, never reaching +1 or -1, but let`s see if we can design a profitable trading strategy even with this little bit of autocorrelation.

### The direction of autocorrelation

Waiting for an autocorrelation of +1 would be useless. There will never be the perfect trend in real world data. My working hypothesis is, that a rising autocorrelation means that the market is getting trendy, thus a rising autocorrelation would be the perfect environment for a trend following strategy. But first we have to define the direction of the autocorrelation:

To define the direction of the autocorrelation I am using my digital stochastic indicator, calculated over half of the period I calculated the autocorrelation. Digital stochastic has the big advantage that it is a quite smooth indicator without a lot of lag, thus making it easy to define its direction. The definition of a trending environment would just be: Trending market if digital stochastic is above it`s yesterdays value.

### Putting autocorrelation phase detection to a test

The most simple trend following strategy I can think about is a moving average crossover strategy. It never works in reality, simply as markets are not trending all the time. But combined with the autocorrelation phase detection, it might have an edge.

Wooha! That`s pretty cool for such a simple strategy. It is trading (long/short) if the market is trending, but does nothing if the market is in a sideway phase. Exactly what I like when using a trend following strategy.

To compare it with the original moving average crossover strategy, the one without the autocorrelation phase detection, you will see the advantage of the autocorrelation phase filter immediately: The equity line is way more volatile than the filtered one and you got lots of drawdowns when the market is sideways.

### Stability of parameters

German power has been a quite trendy market over the last years, that`s why even the unfiltered version of this simple trend following strategy shows a positive result, but let`s have a test on the period of the moving average.

Therefore I calculated the return on account of both strategies, the unfiltered and the autocorrelation filtered, for moving average lengths from 3 to 75 days.

Return on account (ROA) =100 if your max drawdown is as big as your return.

The left chart shows the autocorrelation filtered ROA, the right side the straight ahead moving average crossover strategy. You don`t have to be a genius to see the advantage of the autocorrelation filter. Whatever length of moving average you select, you will get a positive result. This stability of parameters can not be seen with the unfiltered strategy.

### Autocorrelation conclusion:

Trend following strategies are easy to trade, but only make sense when the market is trending. As shown with the tests above, autocorrelation seems to be a nice way to find out if the market is in the right phase to apply a trend following strategy.

There are a lot of statistics which can be used to describe algorithmic trading strategies returns. Risk reward ratio, profit factor, Sharpe ratio, standard deviation of returns… These are great statistics, but they miss an important factor: Are your returns statistically significant or just a collection of lucky noise. The EDGE statistic might me the answer to this question.

If the returns of your trading strategy are positive with in-sample and out-of-sample data this is a first sign that you are on the right path. The next step would be to have a look at the risk-reward ratio of your trading to get an impression if the strategy might be useful in a real world environment.

Assuming that your average yearly returns are about twice as big as the worst case historic draw down you can even be more confident that your strategy is useful. But there is still one thing to check before you can be sure that you are not just seeing a curve fit bullshit strategy. The standard deviation of the daily returns vs. your average daily return.

# Defining EDGE in algorithmic trading

Assume your strategy made 250\$ over the last year. This averages to about 1\$ per day. This 1\$ is a good or bad return, depending on the standard deviation of your equity line. If the standard deviation of your equity is 2\$, then the 1\$ average return strategy would be a bad strategy, as your average returns are way too small in respect to the volatility of your equity. If your volatility of your return curve would just be 50ct and you still make 1\$ per day on average, your strategy would be ingenious.

Edge is the ratio of your average returns vs the volatility of your equity line. To be on the safe side,  your average return should be about 5% above the 90% confidence interval of your equity line volatility.

The left chart is a strategy trading an one month RBOB time spread, the right chart shows the same strategy trading German power. Rbob has got an edge of 3%, German power has got an edge of 5%.

If I would have to select which market I want to trade with this sample strategy, I surely would select German power over the rbob time spread. Both curves have their up and downs, but rbob is heavily relying on a lucky trade in September. This lead to a high standard deviation of the equity line , giving you a low edge reading.

### Conclusion

Observing the ration between your average daily returns vs. the volatility of your equity curve can give you some valuable insights in the quality of your strategy. If it just called a few lucky trades in history, it will also show a high volatility in returns. And this you most probably want to avoid when turning to algorithmic trading. It`s not just the absolute profit at the end of the year, it is also the path you took to get to this number. The smoother, the better!

# Ranking: percent performance and volatility

When ranking a market analysts usually pick the percent performance since a given date as their key figure. If a stock has been at 100 last year and trades at 150 today, percent performance would show you a 50% gain (A). If another stock would only give a 30% gain (B), most people now would draw the conclusion that stock A would have been the better investment. But does this reflect reality?

### Percent Performance and Volatility

In reality and as a trader I would never just buy and hold my position, I would always adjust my position size somehow related to the risk in it. I like instruments that rise smoothly, not the roller coaster ones which will only ruin my nerves. So ranking a market solely by percent performance is an useless statistic for me.

Lets continue with our example from above: if stock A, the one who made 50% has had a 10% volatility, and stock B, the 30% gainer, only had a 5% volatility, I surely would like to see stock B on top of my ranking list, and not the high vola but also high gain stock A.

Risking the same amount of money would have given me a bigger win with stock B.

### Combining Performance and Volatility

To get stock B up in my ranking list I will have to combine the absolute gain with the market volatility in between. This can be done quite simple. Just add up the daily changes of the stock, normalized by market volatility.Have a look at the formula of this new indicator:

index(today)=index(yesterday)+(price(today)-price(yesterday))/(1.95*stdev(price(yesterday)-price(2 days ago),21))

In plain English: Today’s Vola Return Index equals yesterdays Vola Return Index plus the daily gain normalized by volatility

So if the index has been at 100, the volatility (as a 95% confidence interval over 21 days) is 1 and the stock made 2 points since yesterday, then today’s index would be 100 + 2/1 = 3

### Vola Return Index vs. Percent Return Index

Lets have a look at a sample chart to compare the 2 ranking methods. I therefore picked the J.P.Morgan stock.

The upper indicator shows you a percent gain index. It sums up the daily percent gains of the stock movement, basically giving you an impression what you would have won when you would have kept your invested money constant.

The indicator on the bottom is the Vola Return Index. It represents your wins if you would have kept the risk invested into the stock constant. (=e.g. always invest 100\$ on the 21 day 95%confidence interval of the daily returns)

Have a closer look at the differences of these two indicators up to October 2016. JPM is slightly up, and that`s why the percent change index is also in the positive area. During the same time the Vola Return Index just fluctuates around the zero line, as the volatility of JPM picked up during this period of time. To keep your risk invested constant over this period of time you would have downsized your position when JPMs volatility picked up, usually during a draw down. No good.

The same can be observed on the upper chart, showing the last months movements of the index. Right now, after the recent correction the percent change index is, like the JPM stock, up again. On the other side the Vola Return Index is still down, due to the rising volatility in JPM.

### Vola Return Index – Ranking

Lets put this to a test and rank the 30 Dow Jones industrial stocks according to the percent return index and using my Vola Return Index as a comparison, calculated since 01/01/2015.

The first three stocks are the same, they got the highest vola and highest percent return. But JPM and Visa would get a different sorting. Just see how low the JPM Vola Index is, it would not be the 4th best stock.

Percent returns says JPM and Visa are abou the same, only the Vola Return Index shows that VISA would have been the better investment vehicle compared to JPM. But see for yourself on the chart…

### Conclusion

Make sure your indicators show what you actually can do on the market. There is no use in just showing the percent gains of a stock if you trade some kind of VAR adjusted trading style.

Keeping you risk under control is one of the most important things in trading, and using the Vola Return Index instead of just plotting the percent performance can give you some key insights and keep you away from bad investment vehicles. Also have a look at this stock picking portfolio based on similar ideas.