Deknop, Céline
[UCL]
This thesis concerns advanced code differencing techniques in the context of automated large-scale refactoring for legacy software systems. Commonly, differencing compares two artefacts (code, but not only), and is based on the largest common subsequence algorithm developed by Douglas McIlroy and James Hunt in 1976. This algorithm considers both artefacts as blobs of text and outputs the shortest sequence of modifications (adds and deletes) that is necessary to go from one version to the next. Flat textual comparison like this does not reflect developer goals and obscures the intent of the changes due to the excess of low-level information displayed, which can lead to frustrating and tedious experiences. We explored ways to enhance the experience of differencing users by creating new differencing techniques that take advantage of modern techniques and knowledge of the nature of the artefacts they are analysing. This thesis was done in collaboration with a company, Raincode Labs. They are compiler experts and offer various services in the realm of legacy software and automated refactoring. One of those services, which is the use case of this thesis, is automated refactoring of generated COBOL code. While this process is well-known to experts within the company, they have found that, since the process is fairly intricate and behaves as a black box, clients have a hard time trusting the newly refactored code. This causes communication issue and experts have reported that they wished for a tool allowing them to better transmit their expertise to their clients. Combining the new differencing technique and the need present at Raincode Labs to better explain their process, everal tools and techniques were created during this thesis, including a log-based behavioural differencing algorithm, a COBOL semi-parser, and a trace equivalence comparator. These tools are accompanied by visualisation modules, facilitating better communication with clients and reinforcing trust in automated refactoring processes. The contributions of this work have the potential to benefit both Raincode Labs and the wider software engineering community, providing solutions to challenges in legacy software maintenance and refactoring.
Bibliographic reference |
Deknop, Céline. Understanding large codebase refactoring through differencing. Prom. : Mens, Kim ; Zaytsev , Vadim |
Permanent URL |
http://hdl.handle.net/2078.1/284187 |