Van Der Elst, Jérôme
[UCL]
Nijssen, Siegfried
[UCL]
Documents are a natural way for humans to share information. They work very well to represent data in a variety of formats such as text, tables, graphs, titles, footnotes etc. They are, however, time-consuming to read through and not suited as input for computer driven analysis. It is natural to ask ourselves whether this unstructured data can be extracted into a structured format to drive algorithms that perform quantitative analysis. This is a challenging task as heuristic based methods are often tricky to implement and don't generalize well, whereas natural language processing techniques don't have the tools to capture the layout information necessary to correctly interpret these documents. This work studies different techniques for data extraction from business documents in the particular case of extracting CO2 emissions and compares their weaknesses in order to propose improvements.


Bibliographic reference |
Van Der Elst, Jérôme. Extracting ESG data from business documents. Ecole polytechnique de Louvain, Université catholique de Louvain, 2021. Prom. : Nijssen, Siegfried. |
Permanent URL |
http://hdl.handle.net/2078.1/thesis:30732 |