Previously on “How to design a machine learning trading bot”
We have started with “Collecting Data”:
We found out what is OHLCV data, and we learned why we need historical data and online data both together.
So, let’s start the second step considering that we already collected the data and we have it.
This article is the second episode from a series of articles with the title of “How to design a machine learning trading bot”
Step 2: Data Analysis
Cleaning data, filtering them, and features engineering data is one of the essential steps in the machine learning approach.
To explain how important this section means, I can split the whole development time into ten frames, and “Analysing Data” should take half the time of the whole process, which is five frames. In other words, you’re working on the historical data to provide an expected version of data for a machine.
Keep in your mind that these data are like feed for the machines and, if you feed them with a portion of healthy, low-fat food then you can expect them to run better, but if you provide poor quality food for them, then if you have the top most expensive best quality machine in the world, it doesn’t work properly for you.
If you have a mathematics or statistics background you are so fitted for this section, but if you don’t have it, don’t worry :) you can reach your targets with basic knowledge.
Visualizing Data:
Visualizing data is a method to see and analyze them in an easy and perfect way. Thus, by this level, you are going to work a lot with plots and charts and they help you to see what you miss in the raw numbers.
Key Data Analysis Terminology:
There are two main keywords in data analysis that you should know before you go further. “Labels” and “Features”
In the Google machine learning crash course, you can find a fine explanation of these terms.
Last but not least
In many machine learning courses and tutorials, they provide prepared data for you and the course starts with how you train the machine. The data they use in their examples are totally engineered and ready to go for the training, but in reality, the data is not served for you. It should be you to make the data ready for your purposes. Keep in mind that the resources and references still are limited. In our next season at the development part, we will see how we do “Feature Engineering” on OHLCV data and prepare them for training.