Minimizing Information Loss in Shared Data: Hiding Frequent Patterns With Multiple Sensitive Support Thresholds

Bostanoğlu, Belgin Ergenç; Öztürk, Ahmet Cumhur

Please use this identifier to cite or link to this item: https://hdl.handle.net/11147/8831

Title:	Minimizing Information Loss in Shared Data: Hiding Frequent Patterns With Multiple Sensitive Support Thresholds
Authors:	Bostanoğlu, Belgin Ergenç Öztürk, Ahmet Cumhur
Keywords:	Information loss Itemset mining Privacy preserving itemset mining
Publisher:	Wiley
Abstract:	Privacy preserving data mining (PPDM) is the process of protecting sensitive knowledge from being discovered by data mining techniques in case of data sharing. Privacy preserving frequent itemset mining (PPFIM) is a subtask and NP-hard problem of PPDM. Its objective is to modify a given database in such a way that none of the sensitive itemsets of the database owner can be obtained by any frequent itemset mining technique from the modified database. The main challenge of PPFIM is to minimize the distortion given to the data and nonsensitive knowledge while sanitizing all given sensitive itemsets. Distortion-based sensitive itemset hiding algorithms decrease the support of each sensitive itemset under a predefined sensitive threshold through sanitization. Most of the distortion-based itemset hiding algorithms allow database owner to define a single sensitive threshold for each sensitive itemset. However, this is a limitation to the database owner since the importance of each sensitive itemset varies. In this paper we propose a distortion-based itemset hiding algorithm that allows database owner to assign multiple sensitive thresholds, namely itemset oriented pseudo graph based sanitization (IPGBS) algorithm. The purpose of IPGBS algorithm is to give minimum distortion to the nonsensitive knowledge and data while hiding all sensitive itemsets. For this reason, the IPGBS algorithm modifies least amount of transaction and transaction content. The performance evaluation of the IPGBS algorithm is conducted by using two different counterparts on four different databases. The results show that the IPGBS algorithm is more efficient in terms of nonsensitive frequent itemset loss on both dense and sparse databases. It has considerable good results in terms of number of transactions modified, number of items deleted, execution time and total memory allocation as well.
URI:	https://doi.org/10.1002/sam.11458 https://hdl.handle.net/11147/8831
ISSN:	1932-1864 1932-1872
Appears in Collections:	Computer Engineering / Bilgisayar Mühendisliği Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection

Files in This Item:

File	Size	Format
Statistical Analysis.pdf	4.51 MB	Adobe PDF	View/Open

Show full item record

CORE Recommender

SCOPUS^TM
Citations

1

checked on May 16, 2025

WEB OF SCIENCE^TM
Citations

1

checked on May 23, 2025

Page view(s)

328

checked on Jun 10, 2025

Download(s)

210

checked on Jun 10, 2025

Google Scholar^TM

Check

Files in This Item:

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Page view(s)

Download(s)

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM