A Machine Learning Framework for High-Frequency Midprice Prediction: Feature Engineering, Hyperparameter Space Exploration, and Profit-Loss Evaluation

Download
2026-1
Tiryaki, Barış
This thesis studies short-horizon mid-price direction prediction in high-frequency trading using limit order book (LOB) data from Borsa Istanbul. A machine learning framework based on XGBoost is proposed to predict whether the mid-price will move up, stay the same, or move down over a short time horizon. A new feature called Effective Q-Price (EQP) is introduced in the scope of the thesis to better represent market liquidity. EQP combines price and quantity information across multiple depth levels and can be seen as a liquidity-aware extension of the standard mid-price. In addition, a regression-based classification approach is proposed, where a single continuous output is mapped to three trading signals. The models are trained using feature-rich representations extracted from the LOB, and the effects of key hyperparameters are analyzed in a systematic evaluation setup. The imbalanced class labels are addressed with class weighting and decision-threshold tuning. The predictive performance and the complexity of the model are evaluated. To this end, a further contribution of the thesis is a profit and loss (P&L) backtest that converts predictions into trading positions while accounting for transaction costs. The results show that EQP is consistently among the most informative features and allows feature reduction without a clear loss in predictive performance or P&L. The regression-based approach achieves performance similar to direct classification while using smaller and simpler models.
Citation Formats
B. Tiryaki, “A Machine Learning Framework for High-Frequency Midprice Prediction: Feature Engineering, Hyperparameter Space Exploration, and Profit-Loss Evaluation,” M.S. - Master of Science, Middle East Technical University, 2026.