# Machine learning: kNN algorithm explained

I always thought that inspiration and experience are key factors in trading. But every time my chess computer beats me without any inspiration, just by brute force, I get my doubts. This article will be about a brute force approach in trading. The kNN algorithm.

Rule based trading – algorithmic trading, is just a name for a set of if..then rules which will define the machines trading decisions. e.g. if the market crosses below the 200 day line, then short 100 contracts. If the market rises by 2% then exit the position.  Easy stuff like this… (for the beginning)

This article will be a short introduction to machine learning. I will use a classic algorithm of machine learning to let my computer find a prediction for tomorrows market move. In the meantime I’ll have a glass of wine with some friends and let the machine do the job; At least that’s the idea, but can it be that simple in real life trading?

### Unsupervised machine learning – kNN algorithm

The kNN algorithm is one of the most simple machine learning algorithms. Learning, in this case, is only a nice sounding label, in reality kNN is more of a classification algorithm.

This is how it woks:

The scatter chart above is a visualisation of a two dimensional kNN data set. For this article I used a classical indicators of technical analysis to do the prediction: a long-term and a short-term RSI indicator. The dots on the two dimensional scatter chart represent the historic RSI values at a given point of time.

Now have a look at the fat circled point. This point represents today’s value. It means, that today’s RSI1 has a value of 63, and RSI2 got a value of 70.

Additionally to the position on the chart the dots have got colours. A green dot means that the market moved up on the following day, a red dot shows a falling market on the day after.

We already know what has happened in history, so it is easy to colour the historic dots. But we do not know the colour of today’s dot, as it is not known where tomorrow’s market will end.

Based on the chart above, will it be a red or green dot? Will tomorrow be up or down?  Should I go long or should I go short?

### kNN – k nearest neighbours

To do a prediction of tomorrow’s market move, the kNN algorithm uses the historic data shown on the scatter plot above and finds the k-nearest neighbours of today’s RSI values. As you can see, our current fat point is surrounded by red dots. This means, that every time the two RSI values have been in this area, the market fell on the day after. That’s why today’s data point is classified as red. Wish it would be that easy all the times…

Call it classification or prediction, the two dimensional kNN algorithm just has a look on what has happened in the past when the two indicators had a similar level. It then looks at the k nearest neighbours, sees their state and thus classifies today point.

### kNN as Tradesignal Equilla Code

In this article I would like to show you an implementation with the Tradesignal programming language Equilla.

To implement the algorithm in Tradesignal we first have to do the shown scatter plot. The algorithm stores the values in an array.

8/9 calculates the value of the fast and slow RSI indicators

12/13 looks what will happen on the day after (for the training data set)

16/17/18 stores everything in an array.

The next task to complete is to calculate the distances of today’s RSI point to all the historic points in the training data set.

23/27 calculates the euclidean distance of today’s point to all historic points, line 29 then creates a sorted list of all these distances to find the k nearest historic data points in the training data set.

Nearly done. The next step is just to find out what classification (colour) the nearest points have got and use this information to create a prediction for tomorrow. This is done in lines 33 to 35

Have a look at the scatter chart at the beginning. If this would be the data stored in our training data set, the prediction, using the 5 nearest neighbours, would be -5. All the 5 nearest neighbours of our current data point are red.

Now that we got a prediction for tomorrow, we need to make use of this prediction and trade it. The returns then will show if everything works as predicted.

Over here I just do a simple long/short interpretation of the prediction, but of course you could also use the quality of the prediction (+5 or +1?) in some sort of way. Position sizing…?

### kNN algorithm performance

The next chart shows 2000 bars of daily Brent data. It uses a 14 and 28 day RSI to predict the next day’s move in the Brent oil market. The training was on the first half of the data set, and the 5 nearest neighbours did the classification.

Underneath the chart the returns of this test are shown. (strategy equity). On the bottom of the char you see the two RSI indicators used for the generation of the prediction / buy-sell command.

### kNN algorithm – conclusion

The kNN algorithm offers a framework to test all kind of indicators easily to see if they have got any predictive value. Judging on the shown graph it seems to work. It seems to be possible to use these two RSI indicators to predict tomorrow’s Brent move.

But unfortunately this also could be just completely useless curve fitting. It is you who has to select the indicators and their periods and you will have to define if you like the outcome of a selected parameter set. To many degrees of freedom to be sure. The kNN algorithm is useful, but its application in finance has to be treated carefully. Otherwise bad surprises are guaranteed

Not everything can be done by brute force, inspiration and experience are key factors in finance…

The analysis has been done using the tradesignal software suite.

## 5 thoughts on “Machine learning: kNN algorithm explained”

1. Andrey Logunov

A verry instructive and helpful article. I want to just ask a single question. How do construct this scatter plot. Do you scatter data points on a day-per-day manner, that is the x-axis holds time period 1 through 24? Or? Tnx in advance.

2. Andrey Logunov

Isn’t there a logical mistake in line 34?
After sorting into sortedDist you broke ties with trd[i, 3]