Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • apa.csl
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
SoK: Chasing Accuracy and Privacy, and Catching Both in Differentially Private Histogram Publication
Chalmers .
Karlstad University, Faculty of Health, Science and Technology (starting 2013), Department of Mathematics and Computer Science (from 2013).ORCID iD: 0000-0001-9535-6621
2020 (English)In: Transactions on Data Privacy, ISSN 1888-5063, E-ISSN 2013-1631, Vol. 13, no 3, p. 201-245Article in journal (Refereed) Published
Abstract [en]

Histograms and synthetic data are of key importance in data analysis. However, researchers have shown that even aggregated data such as histograms, containing no obvious sensitive attributes, can result in privacy leakage. To enable data analysis, a strong notion of privacy is required to avoid risking unintended privacy violations. Such a strong notion of privacy is differential privacy, a statistical notion of privacy that makes privacy leakage quantifiable. The caveat regarding differential privacy is that while it has strong guarantees for privacy, privacy comes at a cost of accuracy. Despite this trade-off being a central and important issue in the adoption of differential privacy, there exists a gap in the literature regarding providing an understanding of the trade-off and how to address it appropriately. Through a systematic literature review (SLR), we investigate the state-of-the-art within accuracy improving differentially private algorithms for histogram and synthetic data publishing. Our contribution is two-fold: 1) we identify trends and connections in the contributions to the field of differential privacy for histograms and synthetic data and 2) we provide an understanding of the privacy/accuracy trade-off challenge by crystallizing different dimensions to accuracy improvement. Accordingly, we position and visualize the ideas in relation to each other and external work, and deconstruct each algorithm to examine the building blocks separately with the aim of pinpointing which dimension of accuracy improvement each technique/approach is targeting. Hence, this systematization of knowledge (SoK) provides an understanding of in which dimensions and how accuracy improvement can be pursued without sacrificing privacy.

Place, publisher, year, edition, pages
INST ESTUDIOS DOCUMENTALES CIENCIA & TECNOLOGIA-IEDCYT , 2020. Vol. 13, no 3, p. 201-245
Keywords [en]
accuracy improvement, boosting accuracy, data privacy, differential privacy, dimensionality reduction, error reduction, histogram, histograms, noise reduction, sensitivity reduction, synthetic data, SLR, SoK, systematic literature review, systematization of knowledge, taxonomy, utility improvement
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kau:diva-82876ISI: 000604621000002Scopus ID: 2-s2.0-85100178720OAI: oai:DiVA.org:kau-82876DiVA, id: diva2:1529405
Funder
Swedish Research CouncilSwedish Foundation for Strategic Research Available from: 2021-02-18 Created: 2021-02-18 Last updated: 2025-10-17Bibliographically approved
In thesis
1. Go the Extra Mile for Accountability: Privacy Protection Measures for Emerging Information Management Systems
Open this publication in new window or tab >>Go the Extra Mile for Accountability: Privacy Protection Measures for Emerging Information Management Systems
2020 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The thesis considers a systematic approach to design and develop techniques for preventing personal data exposure in next generation information management systems with the aim of ensuring accountability of data controllers (entities that process personal data).

With a rapid growth in the communication technologies, heterogenous computing environments that offer cost-effective data processing alternatives are emerging. Thus, the information-flow of personal data spans beyond the information processing practices of data controllers thereby involving other parties that process personal data. Moreover, in order to enable interoperability, data in such environments is given well-defined structure and meaning by means of graph-based data models. Graphs, inherently emphasize connections between things, and when graphs are used to model personal data records, the connections and the network structure may reveal intimate details about our inter-connected society.

In the European context, the General Data Protection Regulation (GDPR) provides a legal framework for personal data processing. The GDPR stipulates specific consequences for non-compliance to the data protection principles, in the view of ensuring accountability of data controllers in their personal data processing practices. Widely recognized approaches to implement the Privacy by Design (PbD) principle in the software application development process, are broader in scope. Hence, processes to implement personal data protection techniques for specific systems are not the central aspect of the aforementioned approaches.

In order to influence the implementation of techniques for preventing personal data misuse associated with sharing of data represented as graphs, a conceptual mechanism for building privacy techniques is developed. The conceptual mechanism consists of three elements, namely, a risk analysis for Semantic Web information management systems using Privacy Impact Assessment (PIA) approach, two privacy protection techniques for graphs enriched with semantics and a model to approach evaluation of adherence to the goals resulted from the risk analysis. The privacy protection techniques include an access control model that embodies purpose limitation principle—an essential aspect of GDPR—and adaptations of the differential privacy model for graphs with edge labels. The access control model takes into account the semantics of the graph elements for authorizing access to the graph data. In our differential privacy adaptations, we define and study through experiments, four different approaches to adapt the differential privacy model to edge-labeled graph datasets.

Abstract [en]

The thesis considers a systematic approach to design and develop techniques for preventing personal data exposure in next generation information management systems with the aim of ensuring accountability of data controllers (entities that process personal data).

With a rapid growth in the communication technologies, heterogenous computing environments that offer cost-effective data processing alternatives are emerging. Thus, the information-flow of personal data spans beyond the information processing practices of data controllers thereby involving other parties that process personal data. Moreover, in order to enable interoperability, data in such environments is given well-defined structure and meaning by means of graph-based data models. Graphs, inherently emphasize connections between things, and when graphs are used to model personal data records, the connections and the network structure may reveal intimate details about our inter-connected society.

The GDPR stipulates specific consequences for non-compliance to the data protection principles, in the view of ensuring accountability of data controllers in their personal data processing practices. Widely recognized approaches to implement the Privacy by Design (PbD) principle in the software application development process, are broader in scope. Hence, processes to implement privacy techniques for specific systems are not the central aspect of the aforementioned approaches.

In order to influence the implementation of techniques for preventing personal data misuse associated with sharing of data represented as graphs, a conceptual mechanism for building privacy techniques is developed. The conceptual mechanism consists of three elements, namely, a risk analysis for Semantic Web information management systems using Privacy Impact Assessment (PIA) approach, two privacy protection techniques for graphs enriched with semantics and a model to approach evaluation of adherence to the goals resulted from the risk analysis.

Place, publisher, year, edition, pages
Karlstad: Karlstads universitet, 2020. p. 30
Series
Karlstad University Studies, ISSN 1403-8099 ; 2020:32
Keywords
accountability, Privacy By Design (PdD), privacy risks, Privacy Impact Assessment (PIA), audits, privacy compliance, access control, differential privacy, graphs, edge-labeled graphs, Semantic Web
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kau:diva-80531 (URN)978-91-7867-153-3 (ISBN)978-91-7867-157-1 (ISBN)
Public defence
2020-10-30, Frödingsalen, 1B364, Karlstads Universitet, Karlstad, 09:00 (English)
Opponent
Supervisors
Note

Article 5 part of thesis as manuscript, now published.

Available from: 2020-10-13 Created: 2020-09-28 Last updated: 2025-10-17Bibliographically approved

Open Access in DiVA

No full text in DiVA

Scopus

Authority records

Reuben, Jenni

Search in DiVA

By author/editor
Reuben, Jenni
By organisation
Department of Mathematics and Computer Science (from 2013)
In the same journal
Transactions on Data Privacy
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 286 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • apa.csl
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf