A surrogate-assisted GA enabling high-throughput ML by optimal feature and discretization selection
2020 (English)In: GECCO 2020 Companion - Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion, Association for Computing Machinery, Inc , 2020, p. 1632-1640Conference paper, Published paper (Refereed)
Abstract [en]
Novel lookup-based classification approaches allow machine-learning (ML) to be performed at extremely high classification rates for suitable low-dimensional classification problems. A central aspect of such approaches is the crucial importance placed on the optimal selection of features and discretized feature representations. In this work we propose and study a hybrid-genetic algorithm (hGAm) approach to solve this optimization problem. For the considered problem the fitness evaluation function is expensive, as it entails training a ML classifier with the proposed set of features and representations, and then evaluating the resulting classifier. We have here devised a surrogate problem by casting the feature selection and representation problem as a combinatorial optimization problem in the form of a multiple-choice quadratic knapsack problem (MCQKP). The orders of magnitude faster evaluation of the surrogate problem allows a comprehensive hGAm performance evaluation to be performed. The results show that a suitable trade-off exists at around 5000 fitness evaluations, and the results also provide a characterization of the parameter behaviors as input to future extensions.
Place, publisher, year, edition, pages
Association for Computing Machinery, Inc , 2020. p. 1632-1640
Keywords [en]
Discretization, Feature selection, GA, Surrogate problem, Combinatorial optimization, Economic and social effects, Feature extraction, Genetic algorithms, Machine learning, Classification approach, Classification rates, Combinatorial optimization problems, Feature representation, Hybrid genetic algorithms, Optimization problems, Orders of magnitude, Quadratic knapsack problems, Classification (of information)
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kau:diva-82975DOI: 10.1145/3377929.3398092Scopus ID: 2-s2.0-85089739633ISBN: 9781450371278 (print)OAI: oai:DiVA.org:kau-82975DiVA, id: diva2:1529705
Conference
2020 Genetic and Evolutionary Computation Conference, GECCO 2020, 8 July 2020 through 12 July 2020
2021-02-192021-02-192021-04-27Bibliographically approved