Change search
Link to record
Permanent link

Direct link
Korhonen, Topi
Publications (6 of 6) Show all publications
Korhonen, T. & Garcia, J. (2021). Exploring Ranked Local Selectors for Stable Explanations of ML Models. In: 2021 2nd International Conference on Intelligent Data Science Technologies and Applications, IDSTA 2021: . Paper presented at 2nd International Conference on Intelligent Data Science Technologies and Applications, IDSTA 2021, 15 November 2021 through 16 November 2021 (pp. 122-129). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Exploring Ranked Local Selectors for Stable Explanations of ML Models
2021 (English)In: 2021 2nd International Conference on Intelligent Data Science Technologies and Applications, IDSTA 2021, Institute of Electrical and Electronics Engineers (IEEE), 2021, p. 122-129Conference paper, Published paper (Refereed)
Abstract [en]

While complex machine learning methods can achieve great performance, human-interpretable details of their internal reasoning is to a large extent unavailable. Interpretable machine learning can remedy the lack of access to model reasoning but remains an elusive feat to fully achieve. Here we propose ranked selectors as a method for post-hoc explainability of classification outcomes from arbitrary classification models, with an initial emphasis on tabular data of moderate dimensions. The method is based on constructing a set of selectors, or rules, delimiting a local class consistent domain with maximal cover around the item of interest. The extended adjacent feature space is probed to achieve a ranking of the selectors. The method supports the use of an explicit foil class Q, allowing the formulation of contrastive queries in the form 'Why inference P instead of alternative inference Q?'. The answer is given as a short list of disjoint rules, a format previously shown to be amenable to human interpretation. We demonstrate the proposed method in open datasets, and elaborate on its stability aspects relative to other comparable methods.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2021
Keywords
black box models, contrastivity, Explainability, Adjacent feature, Black box modelling, Classification models, Machine learning methods, Maximal covers, Model reasonings, Performance, Tabular data, Machine learning
National Category
Information Studies Computer Sciences
Identifiers
urn:nbn:se:kau:diva-89479 (URN)10.1109/IDSTA53674.2021.9660809 (DOI)000852877600018 ()2-s2.0-85124559014 (Scopus ID)9781665421805 (ISBN)
Conference
2nd International Conference on Intelligent Data Science Technologies and Applications, IDSTA 2021, 15 November 2021 through 16 November 2021
Available from: 2022-04-12 Created: 2022-04-12 Last updated: 2022-09-22Bibliographically approved
Korhonen, T., Lagerlöf, J. H. & Muntean, A. (2020). Computational study of the effect of hypoxia on cancer response to radiation treatment. ROMAI Journal, 16(2), 75-86
Open this publication in new window or tab >>Computational study of the effect of hypoxia on cancer response to radiation treatment
2020 (English)In: ROMAI Journal, ISSN 1841-5512, E-ISSN 2065-7714, Vol. 16, no 2, p. 75-86Article in journal (Refereed) Published
Abstract [en]

We perform a computational study of the propagation of the oxygen concentration within a two-dimensional slice of a heterogeneous tumour region where the position and shape of the blood vessels are known. Exploiting the parameters space, we explore which effect is noticeable what concerns the formation of hypoxic zones. We use this information to anticipate a patient-specific radiation treatment with a potentially controlled response of the cancer growth.

Place, publisher, year, edition, pages
The Romanian Society of Applied and Industrial Mathematics, 2020
Keywords
Mathematical modeling of hypoxia, finite element method, cancer response to radiation
National Category
Clinical Medicine
Research subject
Mathematics
Identifiers
urn:nbn:se:kau:diva-85511 (URN)
Available from: 2021-07-22 Created: 2021-07-22 Last updated: 2021-11-18Bibliographically approved
Garcia, J. & Korhonen, T. (2020). DIOPT: Extremely Fast Classification Using Lookups and Optimal Feature Discretization. In: 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN): . Paper presented at International Joint Conference on Neural Networks (IJCNN) held as part of the IEEE World Congress on Computational Intelligence (IEEE WCCI), JUL 19-24, 2020, ELECTR NETWORK. IEEE
Open this publication in new window or tab >>DIOPT: Extremely Fast Classification Using Lookups and Optimal Feature Discretization
2020 (English)In: 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), IEEE , 2020Conference paper, Published paper (Refereed)
Abstract [en]

For low dimensional classification problems we propose the novel DIOPT approach which considers the construction of a discretized feature space. Predictions for all cells in this space are obtained by means of a reference classifier and the class labels are stored in a lookup table generated by enumerating the complete space. This then leads to extremely high classification throughput as inference consists only of discretizing the relevant features and reading the class label from the lookup table index corresponding to the concatenation of the discretized feature bin indices. Since the size of the lookup table is limited due to memory constraints, the selection of optimal features and their respective discretization levels is paramount. We propose a particular supervised discretization approach striving to achieve maximal class separation of the discretized features, and further employ a purpose-built memetic algorithm to search towards the optimal selection of features and discretization levels. The inference run time and classification accuracy of DIOPT is compared to benchmark random forest and decision tree classifiers in several publicly available data sets. Orders of magnitude improvements are recorded in classification runtime with insignificant or modest degradation in classification accuracy for many of the evaluated binary classification tasks.

Place, publisher, year, edition, pages
IEEE, 2020
Series
IEEE International Joint Conference on Neural Networks (IJCNN), ISSN 2161-4393
National Category
Computer and Information Sciences
Research subject
Computer Science; Computer Science; Computer Science
Identifiers
urn:nbn:se:kau:diva-83709 (URN)10.1109/IJCNN48605.2020.9207037 (DOI)000626021403072 ()2-s2.0-85089746323 (Scopus ID)978-1-7281-6926-2 (ISBN)
Conference
International Joint Conference on Neural Networks (IJCNN) held as part of the IEEE World Congress on Computational Intelligence (IEEE WCCI), JUL 19-24, 2020, ELECTR NETWORK
Available from: 2021-04-19 Created: 2021-04-19 Last updated: 2021-09-30Bibliographically approved
Garcia, J. & Korhonen, T. (2018). Efficient Distribution-Derived Features for High-Speed Encrypted Flow Classification. In: NetAI'18 Proceedings of the 2018 Workshop on Network Meets AI & ML: . Paper presented at 2018 Workshop on Network Meets AI & ML. August 24 - 24, 2018. Budapest, Hungary. (pp. 21-27). New York: Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Efficient Distribution-Derived Features for High-Speed Encrypted Flow Classification
2018 (English)In: NetAI'18 Proceedings of the 2018 Workshop on Network Meets AI & ML, New York: Association for Computing Machinery (ACM), 2018, p. 21-27Conference paper, Published paper (Refereed)
Abstract [en]

Flow classification is an important tool to enable efficient network resource usage, support traffic engineering, and aid QoS mechanisms. As traffic is increasingly becoming encrypted by default, flow classification is turning towards the use of machine learning methods employing features that are also available for encrypted traffic. In this work we evaluate flow features that capture the distributional properties of in-flow per-packet metrics such as packet size and inter-arrival time. The characteristics of such distributions are often captured with general statistical measures such as standard deviation, variance, etc. We instead propose a Kolmogorov-Smirnov discretization (KSD) algorithm to perform histogram bin construction based on the distributional properties observed in the data. This allows for a richer, histogram based, representation which also requires less resources for feature computation than higher order statistical moments. A comprehensive evaluation using synthetic data from Gaussian and Beta mixtures show that the KSD approach provides Jensen-Shannon distance results surpassing those of uniform binning and probabilistic binning. An empirical evaluation using live traffic traces from a cellular network further shows that when coupled with a random forest classifier the KSD-constructed features improve classification performance compared to general statistical features based on higher order moments, or alternative bin placement approaches.

Place, publisher, year, edition, pages
New York: Association for Computing Machinery (ACM), 2018
Keywords
Traffic classification, Discretization, Machine learning
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kau:diva-68707 (URN)10.1145/3229543.3229548 (DOI)978-1-4503-5911-5 (ISBN)
Conference
2018 Workshop on Network Meets AI & ML. August 24 - 24, 2018. Budapest, Hungary.
Projects
HITS
Available from: 2018-08-14 Created: 2018-08-14 Last updated: 2019-11-08Bibliographically approved
Garcia, J. & Korhonen, T. (2018). On Runtime and Classification Performance of the Discretize-Optimize (DISCO) Classification Approach. Performance Evaluation Review, 46(3), 167-170
Open this publication in new window or tab >>On Runtime and Classification Performance of the Discretize-Optimize (DISCO) Classification Approach
2018 (English)In: Performance Evaluation Review, ISSN 0163-5999, E-ISSN 1557-9484, Vol. 46, no 3, p. 167-170Article in journal (Refereed) Published
Abstract [en]

Using machine learning in high-speed networks for tasks such as flow classification typically requires either very resource efficient classification approaches, large amounts of computational resources, or specialized hardware. Here we provide a sketch of the discretize-optimize (DISCO) approach which can construct an extremely efficient classifier for low dimensional problems by combining feature selection, efficient discretization, novel bin placement, and lookup. As feature selection and discretization parameters are crucial, appropriate combinatorial optimization is an important aspect of the approach. A performance evaluation is performed for a YouTube classification task using a cellular traffic data set. The initial evaluation results show that the DISCO approach can move the Pareto boundary in the classification performance versus runtime trade-off by up to an order of magnitude compared to runtime optimized random forest and decision tree classifiers.

Place, publisher, year, edition, pages
New york, USA: Association for Computing Machinery (ACM), 2018
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kau:diva-71213 (URN)10.1145/3308897.3308965 (DOI)
Projects
HITS, 4707
Funder
Knowledge Foundation
Available from: 2019-02-20 Created: 2019-02-20 Last updated: 2019-11-08Bibliographically approved
Garcia, J., Korhonen, T., Andersson, R. & Västlund, F. (2018). Towards Video Flow Classification at a Million Encrypted Flows Per Second. In: Leonard Barolli, Makoto Takizawa, Tomoya Enokido, Marek R. Ogiela, Lidia Ogiela & Nadeem Javaid (Ed.), Proceedings of 32nd International Conference on Advanced Information Networking and Applications (AINA): . Paper presented at 32nd International Conference on Advanced Information Networking and Applications (AINA). Krakow, Poland, 16-18 May 2018.. Krakow: IEEE
Open this publication in new window or tab >>Towards Video Flow Classification at a Million Encrypted Flows Per Second
2018 (English)In: Proceedings of 32nd International Conference on Advanced Information Networking and Applications (AINA) / [ed] Leonard Barolli, Makoto Takizawa, Tomoya Enokido, Marek R. Ogiela, Lidia Ogiela & Nadeem Javaid, Krakow: IEEE, 2018Conference paper, Published paper (Refereed)
Abstract [en]

As end-to-end encryption on the Internet is becoming more prevalent, techniques such as deep packet inspection (DPI) can no longer be expected to be able to classify traffic. In many cellular networks a large fraction of all traffic is video traffic, and being able to divide flows in the network into video and non-video can provide considerable traffic engineering benefits. In this study we examine machine learning based flow classification using features that are available also for encrypted flows. Using a data set of several several billion packets from a live cellular network we examine the obtainable classification performance for two different ensemble-based classifiers. Further, we contrast the classification performance of a statistical-based feature set with a less computationally demanding alternate feature set. To also examine the runtime aspects of the problem, we export the trained models and use a tailor-made C implementation to evaluate the runtime performance. The results quantify the trade-off between classification and runtime performance, and show that up to 1 million classifications per second can be achieved for a single core. Considering that only the subset of flows reaching some minimum flow length will need to be classified, the results are promising with regards to deployment also in scenarios with very high flow arrival rates.

Place, publisher, year, edition, pages
Krakow: IEEE, 2018
Series
Advanced Information Networking and Applications, ISSN 1550-445X, E-ISSN 2332-5658
Keywords
Cryptography, Runtime, Cellular networks, Machine learning, Forestry, Data models, Support vector machines
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kau:diva-68705 (URN)10.1109/AINA.2018.00061 (DOI)000454817500048 ()978-1-5386-2196-7 (ISBN)978-1-5386-2195-0 (ISBN)
Conference
32nd International Conference on Advanced Information Networking and Applications (AINA). Krakow, Poland, 16-18 May 2018.
Projects
HITS
Available from: 2018-08-14 Created: 2018-08-14 Last updated: 2019-02-14Bibliographically approved
Organisations

Search in DiVA

Show all publications