Posts Statistical Similarity of Binaries
Post
Cancel

Statistical Similarity of Binaries

Paper accepted in PLDI 16”.

Slides, Website, Dataset

Collaborator(s): Yaniv David, Eran Yahav

Abstract: We address the problem of finding similar procedures in stripped binaries. We present a new statistical approach for measuring the similarity between two procedures. Our notion of similarity allows us to find similar code even when it has been compiled using different compilers, or has been modified. The main idea is to use similarity by composition: decompose the code into smaller comparable fragments, define semantic similarity between fragments, and use statistical reasoning to lift fragment similarity into similarity between procedures. We have implemented our approach in a tool called Esh, and applied it to find various prominent vulnerabilities across compilers and versions, including Heartbleed, Shellshock and Venom. We show that Esh produces high accuracy results, with few to no false positives – a crucial factor in the scenario of vulnerability search in stripped binaries.

This post is licensed under CC BY 4.0 by the author.