Taherkhani A. (2011)
Using Decision Tree Classifiers in Source Code Analysis to Recognize Algorithms: An Experiment with Sorting Algorithms
The Computer Journal, 54(11): 1845-1860, doi: 10.1093/comjnl/bxr025.
Abstract: We discuss algorithm recognition (AR) and present a method for recognizing algorithms automatically from Java source code. The method consists of two phases. In the first phase, the recognizable algorithms are converted into the vectors of characteristics, which are computed based on static analysis of program code, including various statistics of language constructs and analysis of Roles of Variables in the target program. In the second phase, the algorithms are classified based on these vectors using the C4.5 decision tree classifier. We demonstrate the performance of the method by applying it to sorting algorithms. Using leave-one-out cross-validation technique, we have conducted an experimental evaluation of the classification performance showing that the average classification accuracy is 98.1% (the data set consisted of five different types of sorting algorithms). The results show the applicability and usefulness of roles of variables in AR, and illustrate that the C4.5 algorithm is a suitable decision tree classifier for our purpose. The limitations of the method are also discussed.
Last updated: December 20, 2011