官术网_书友最值得收藏!

Assumptions and design choices

One of the assumptions of this project is as follows: whether we are thinking about Bitcoin trading in November 2016 with a price of about $700, or trading in November 2017 with a price in the $6500-7000 range, patterns in how people trade are similar. Now, we have several other assumptions, as described in the following points:

  • Assumption one: From what has been said previously, we can ignore the actual price and rather look at its change. As a measure of this, we can take the delta between opening and closing prices. If it is positive, it means the price grew during that minute; the price went down if it is negative and stayed the same if delta = 0.
    In the following figure, we can see that Delta was -1.25 for the first minute observed, -12.83 for the second one, and -0.23 for the third one. Sometimes, the open price can differ significantly from the close price of the previous minute (although Delta is negative during all three of the observed minutes, for the third minute the shown price was actually higher than close for a second). But such things are not very common, and usually the open price doesn't change significantly compared to the close price of the previous minute.
  • Assumption two: The next need to consider...  is predicting the price change in a black box environment. We do not use other sources of knowledge such as news, Twitter feeds, and others to predict how the market would react to them. This is a more advanced topic. The only data we use is price and volume. For simplicity of the prototype, we can focus on price only and construct time series data.
    Time series prediction is a prediction of a parameter based on the values of this parameter in the past. One of the most common examples is temperature prediction. Although there are many supercomputers using satellite and sensor data to predict the weather, a simple time series analysis can lead to some valuable results. We predict the price at T+60 seconds, for instance, based on the price at T, T-60s, T-120s and so on.
  • Assumption three: Not all data in the dataset is valuable. The first 600,000 records are not informative, as price changes are rare and trading volumes are small. This can affect the model we are training and thus make end results worse. That is why the first 600,000 of rows are eliminated from the dataset.
  • Assumption four: We need to Label our data so that we can use a supervised ML algorithm. This is the easiest measure, without concerns about transaction fees.
主站蜘蛛池模板: 丁青县| 新晃| 右玉县| 承德市| 麻江县| 土默特左旗| 丰顺县| 吐鲁番市| 阿拉善盟| 锡林郭勒盟| 抚州市| 郓城县| 樟树市| 甘南县| 镇安县| 龙州县| 林州市| 巴林右旗| 吉林省| 睢宁县| 房产| 乌鲁木齐县| 台中市| 曲阜市| 太康县| 浦东新区| 河西区| 烟台市| 赤城县| 泉州市| 云安县| 巴青县| 交口县| 三门县| 西丰县| 扬州市| 阜阳市| 铜陵市| 平原县| 浦东新区| 石家庄市|