مقاله


کد مقاله : 139504261512112857(DOI : 10.7508/jist.2016.02.001)

عنوان مقاله : Privacy Preserving Big Data Mining: Association Rule Hiding

نشریه شماره : 14 Spring 2016

مشاهده شده : 683

فایل های مقاله :


نویسندگان

  نام و نام خانوادگی پست الکترونیک مرتبه علمی مدرک تحصیلی مسئول
1 Golnar Assadat Afzali g.afzali@gmail.com - M.Sc
2 Shahriar Mohammadi mohammadi@kntu.ac.ir Assistant Professor PhD

چکیده مقاله

Data repositories contain sensitive information which must be protected from unauthorized access. Existing data mining techniques can be considered as a privacy threat to sensitive data. Association rule mining is one of the utmost data mining techniques which tries to cover relationships between seemingly unrelated data in a data base.. Association rule hiding is a research area in privacy preserving data mining (PPDM) which addresses a solution for hiding sensitive rules within the data problem. Many researches have be done in this area, but most of them focus on reducing undesired side effect of deleting sensitive association rules in static databases. However, in the age of big data, we confront with dynamic data bases with new data entrance at any time. So, most of existing techniques would not be practical and must be updated in order to be appropriate for these huge volume data bases. In this paper, data anonymization technique is used for association rule hiding, while parallelization and scalability features are also embedded in the proposed model, in order to speed up big data mining process. In this way, instead of removing some instances of an existing important association rule, generalization is used to anonymize items in appropriate level. So, if necessary, we can update important association rules based on the new data entrances. We have conducted some experiments using three datasets in order to evaluate performance of the proposed model in comparison with Max-Min2 and HSCRIL. Experimental results show that the information loss of the proposed model is less than existing researches in this area and this model can be executed in a parallel manner for less execution time