Biondi, Fabrizio
Enescu, Michael
Given-Wilson, Thomas
Legay, Axel
[UCL]
Noureddine, Lamine
Verma, Vivek
Packing is a widespread tool to prevent static malware detection and analysis. Detecting and classifying the packer used by a given malware sample is fundamental to being able to unpack and study the malware, whether manually or automatically. Existing literature on packing detection and classification has focused on effectiveness, but does not consider the efficiency required to be part of a practical malware-analysis workflow. This paper studies how to train packing detection and classification algorithms based on machine learning to be both highly effective and efficient. Initially, we create ground truths by labeling more than 280,000 samples with three different techniques. Then we perform feature selection considering the contribution and computation cost of features. Then we iterate over more than 1,500 combinations of features, scenarios, and algorithms to determine which algorithms are the most effective and efficient, finding that a reduction of 1-2% effectiveness can increase efficiency by 17-44 times. Then, we test how the best algorithms perform against malware collected after the training data to assess them against new packing techniques and versions, finding a large impact of the ground truth used on algorithm robustness. Finally, we perform an economic analysis and find simple algorithms with small feature sets to be more economical than complex algorithms with large feature sets based on uptime/training time ratio.
Bibliographic reference |
Biondi, Fabrizio ; Enescu, Michael ; Given-Wilson, Thomas ; Legay, Axel ; Noureddine, Lamine ; et. al. Effective, Efficient, and Robust Packing Detection and Classification. In: Computers and Security, |
Permanent URL |
https://hdl.handle.net/2078.1/210678 |