Basic modelGiven the length of the dataset (not much more than 500 trades), I used Cross-Validation not to lose any data for splitting. Also, I tested 7 models, which were:
- Random Forest Classifier
- Logistic Regression
- XGBoost Classifier
- CatBoost Classifier
- SGD Classifier
- Support Vector Classification
- K Neighbors Classifier
Here you can see the summarised results of performance by key metrics:
The model effectively filters "bad" trades, though at the cost of remove also some of the "good" trades. This trade-off is inherent, as perfect filtering is unattainable with the given market state information. However, the model stabilises and increases the average profit per trade by over 150%, significantly improving robustness.
To be sure, that this is not a random increase, I conducted a LLN simulation. The idea behind it was to utilise law of large number and get the average trade profit if we randomly filter-out trades based on the ration between "good" and "bad". As can be seen, the average profit per trade does not increased, giving one more sign of the model's effectiveness.
EnsemblingAnother thing to try was to ensemble top performing models. The idea is simple - working together, they will cover each one's errors and will help to increase the profit even more. Here is the results of conduction such an approach:
As can be seen, it increased the average profit per trade even more. However, the standard deviation also increased, so we can draw a conclusion, that, despite being more profitable, model became less stable, which is a very natural result considering risk-profit tradeoff.
TuningI also performed a try to tune the top models, however, this does not increase profit and not reduced standard deviation, that is why I will not bother you with this.