Alexandre Tytgat
[Queen Mary University of London]
Vande Kerckhove, Corentin
[UCL]
Mihai Cucuringu
[University of Oxford]
Andrew Peek
[Delphia]
Alexander Shestopaloff
[Queen Mary University of London]
In the ever-evolving landscape of financial forecasting, the emergence of machine learning has revolutionized the way we approach predicting company financials and stock prices. The challenge arises from the quarterly frequency of financial data, necessitating models capable of accommodating all companies within a unified framework. This reflects a paradigm shift in forecasting from individual time series modeling to global modeling. This shift underscores the need for robust methodologies that can harness the power of machine learning across diverse datasets to achieve accurate predictions while navigating the intricacies of temporal and cross-sectional dynamics. Alternative data sources, ranging from social media sentiment to credit card transactions, offer a wealth of untapped insights for financial forecasting. Yet, their integration poses unique challenges, including the absence of historical data and limited coverage across companies. Additionally, the acquisition costs associated with these datasets can be pro- hibitively high, further complicating model development and backtesting processes. Despite their inherent value, high rates of missing data in alternative sources often render them unsuitable for traditional global models. This presents a dual challenge: how to effectively leverage these valuable features with sparse data and how to strategically prioritize data acquisition efforts to optimize model performance. In our study, we delineate different scenarios mirroring varying degrees of missing values, each with customizable rates of absence. We advance by contrasting two approaches designed to glean insights from features with limited coverage: (i) model guidance, inspired by multi-task learning, and (ii) crafting new dense features through feature engineering, achieved by amalgamating observations from sparse features. Furthermore, we introduce a framework for actively selecting features to strike a balance between exploring new features and exploiting existing ones within the acquisition process. In addition, our methodology extends its application to network time series, thereby broadening its utility across diverse data domains and temporal structures. Moreover, we demonstrate an extension of this framework to accommodate non-uniform costs associated with feature acquisition.
Bibliographic reference |
Alexandre Tytgat ; Vande Kerckhove, Corentin ; Mihai Cucuringu ; Andrew Peek ; Alexander Shestopaloff. Extracting the low-coverage juice: exploiting information of alternative data with high missing ratio for financial forecasting.Workshop on Complex Networks in Banking and Finance (Toronto). |
Permanent URL |
http://hdl.handle.net/2078.1/288072 |