How to develop a machine learning trading bot: Data Collection


Bahman in How to

Jun 02, 2021

How to create an automated trading system using machine learning. 

Part 1: Data Collection

Data Collection


We have now arrived at the second season of our article series on how to design, build, and make automated bots with machine learning techniques.

It is best to start designing a trading bot before you start coding it. 

Previously on "How to": 

We have begun with “Collecting Data”:

In that lesson, we learned what OHLCV data is and why we need historical data as well as online data together.

Next, we discussed “Data Analysis

We have seen how important it is to clean data and features engineering. To make a stable machine learning model, we need to prepare the data in the right way and, last but not least, we learned how to visualize this data in order to succeed.

Then we continued with finding a pattern and we noted you can be trapped like a horoscopist to find a pattern in data. You should always follow the scientific methods and act as an astronomer. Then we identified a very simple pattern, SMA20, and talked about how to label them to [0,1].

Then, we build a model. We explained, Once you make an ML (Machine Learning) model, you should evaluate it with a backtest. You will need a strategy at this point to Buy and Sell. This means that we already have a signal to open a Long/Short position.

Then, we showed how to run the automation, by these actions:

  • Place an order
  • Retrieve an order
  • Cancel an order
  • Get the asset’s balance
  • Get Online price

Following by understanding the concept of exchanges, API, and spot trading.

In the end, we discussed risk management as well, as well as how to monitor the trade online and continuously. 

For better performance in developing and coding, please read "How to design a machine learning trading bot - Part 1: Data Collection" before continuing with this section.  

Let's move on to the development season.


Here is our plan for today:

  1. Get historical data for the BTC/USDT pair. 
  2. Get stream data from Binance and Kraken.
  3. Store the BTC/USDT and BTC/EUR in a database in a 1min timeframe.

Let's begin with "Historical Data."

What are historical data and why do we need them? 

Historical data in financial markets is collected data about past pair rates. To collect historical data on the crypto market specifically, you first need to know which exchange you're using. In this study, we will be focusing on two of the most popular cryptocurrency exchanges, Binance and Kraken. 

Then, what is the purpose of historical data? 

Data is the main fuel for machine learning algorithms. A clean set of data will make your job with ML much easier. We will back to the "clean" term later on in the next episode.  

To develop and build a machine learning trading model we need OHLC (Open, High, Low, Close) or OHLCV (Open, High, Low, Close, Volume) data. There are many possibilities for the timing of OHLC data. In most cases, 1MIN OHLC is the shortest time frame for trading. Then, we can build the other common time frames from 1MIN, such as 5MIN or 1Hour. (Custom time frames have also recently become interesting, like 100 minutes).

Most of the financial market's historical data are in OHLCV format, but you should know that if you have the 1MIN OHLCV you can make any other (longer than 1MIN) time frame format from it. (We'll explore this further in the next episode)

It is important to have "historical data" for the learning process. In "Build a Model", we showed that we need a bunch of labeled data to make a machine-learning model. 

Where can I find the historical data for cryptocurrencies?

A 1H time frame historical data for BTC/USDT in the Binance exchange since September 2017 has been prepared by 1DES. You can find it in this repository:

You can find more about data format by reading the "ReadMe" file. 

Here are some other free historical data resources:

What is the best way to consistently store the 1MIN BTC/USDT rate from Binance using the Binance API? 

Price Chart

0 -Okay then. Let's take it a step further and collect live data. We are using the following repository to collect data and to build a 1MIN timeframe at 1DES:

1 -The repository is written in Python. Clone the repository to your local first:

git clone [email protected]:the1des/crypto_data_collection.git 

2- Ensure you have installed all the requirements:

pip install -r requirements.txt

3- You need PostgreSQL installed on your local, then the queries for creating the tables are in the "sql" folder. We recommend the database name "ohlc" for these tables. As a result of your query, you will have four tables. "binance_btcusdt" would be one of them. 

4- Duplicate the file ".env.example" and rename it ".env", and then update the following environmental variables:


You do not have to change anything else if you didn't change the tables' names. 

5- The last thing to do is just run the code. If you run the following command, the tables will be filled every minute with data from the exchanges (Binance and Kraken).

Collect BTC/USDT and BTC/EUR from Binance:

python bin/ 

Collect BTC/USD and BTC/EUR from Kraken:

python bin/ 

6- Done! Now, you are collecting BTC/USDT rates non-stop and converting them to 1MIN OHLCV data format. You will need this data later to feed your machine learning models.  

This post was originally published on