Monday, June 3, 2019

Privacy-handling Techniques and Algorithms for Data Mining

Privacy-handling Techniques and Algorithms for Data MiningVIVEK UNIYALABSTRACTData mining can extract a previously extraterrestrial patterns from vast accumulation of entropy. Nowadays networking, hardw be and software technology are rapidly growing outstanding in collection of info amount. Organization are containing huge amount of data from some(prenominal) heterogeneous database in which private and excellent information of an individual. In data mining novel pattern exit be extracted from such data by which we can use for various domains in decision marketing. But in the data mining output thither will be sensitive, private or personal information of a particular person can also be revealed. There will be some misuse of finding these types of information, and it can harm the data owner. So in distributed environment privacy is becoming an important issue in m all applications of data mining. Techniques of Privacy preserving data mining (PPDM) are provide new direction to s olve issues. By PPDM, we can find a valid data mining results without underlying data values learning.In this dissertation we have introduced two algorithms for privacy handling concern. One is k-anonymization in which information corresponding to any individual person in a release data cannot be distinguished from that of at least k-1 other individual persons whose information also appears in release data. In this algorithm we are achieving the k-anonimyzation some values must be suppressed or generalized in database. K-anonymity have record gene linkage attack mode and l-diversity can have attack mode of attribute linkage.KEYWORDS Data Mining, Advantages and Disadvantages of Data Mining, Privacy handking, K-anonymization Algorithm, L-diversity.ACKNOWLEDGEMENTSI wish to take this opportunity to stock my deep gratitude to all the people who have extended their cooperation in various ways during my dissertation. It is my pleasure to acknowledge the help of all those individuals.Firs t of all, I would kindred to express my deepest gratitude to my dissertation supervisor, Mr. Govind Kamboj without whom none of this would have been possible. He provided me always the essential direction and advice during the work. I am grateful to him to give a shape towards result of my dissertation. Without his supervision and retain, this work would not have been completed successfully in time.I am grateful to the President, Vice President, Chancellor, Vice Chancellor and Head of the subdivision of the Graphic Era University for providing an excellent environment for work with ample facilities and academic freedom. I would also like to thank the teaching and non-teaching staff for their valuable support during M.Tech.Last but not the least I am grateful to all my teachers and friends for their cooperation and encouragement throughout completing this task.(Vivek Uniyal)M.Tech( Computer Science Engineering)TABLE OF contentCANDIDATES DECLERATION iiiABSTRACT ivACKNOWLEDGEME NT vLIST OF ABBREVIATIONS ixLIST OF FIGURES x1. INTRODUCTION 11.1 Problem Statement 11.2 Overview 11.3 Advantages of data mining 31.4 Disadvantages of data mining 41.5 Why privacy-handling is required in data-mining 41.6 Motivation 61.7 Organization 42. oscilloscope AND LITERATURE SURVEY 73. METHODS AND METHODOLOGIES 133.1 Randomization method 133.2 Group based anonymization methods 143.2.1 K-Anonymity framework 143.2.2 Personalized privacy-preservation 153.2.3 emolument based privacy-preservation 153.2.4 Sequential releases 153.2.5 The l-diversity method 153.3 Distributed privacy-preserving data mining 163.4 Detailed description about K-anonymity and l-diversity 163.4.1 Data collection and Data publishing 163.4.2 Privacy Data publishing 173.4.3 Algorithm of k-anonimity 193.4.4 l-diversity 243.4.1.1 Lack of diversity 253.4.1.2 Strong terra firma knowledge 254. EXPERIMENTAL RESULT 274.1 Introduction 274.2 Experimental result 274.2.1 Result of proposed k-anonymity and l-diversity 275. CONCLUSION AND SCOPE FOR succeeding(a) WORK 335.1 Conclusion 335.2 Scope for Future Work 33PUBLICATION OUT OF THIS WORK 34REFERENCES 35LIST OF ABBREVIATIONSPPDP Privacy-preserving data publishingPPDMPrivacy-preserving data miningQID Quasi-IdentifierLIST OF FIGURES gens 1.1 Data mining a step included in the process of knowledge discovery 1 calculate 1.2 Typical data mining system architecture 2Figure 1.3 Record Owner, Data Collection and Data Publishing 17Figure 1.4 Hospital Database 18Figure 1.5 Taxonomy tree for JOB, SEX, AGE (QID attributes) 20Figure 1.6 Hospital table Original record in data base 21Figure 1.7 circumvent of Sensitive record (Publishing data) 21Figure 1.8 Table of External Data ppt table 22Figure 1.9 Resulting data after linking the sensitive and ppl table 22Figure 1.10 Research table (generalized with k-anonymous published data) 23Figure 1.11 Extended table (For linking like generalized voter list) 23Figure 1.12 For checking the k- anonymity 23Figure 1.13 Result of linking the table research to extended 24Figure 1.14 Hospital original data record Project 28Figure 1.15 Comparing the Un-Generalized published and extended data tables 29 Figure 1.16 Comparing Generalized Extended and Sensitive table records 30 Figure 1.17 Table for k-anonymity and l-diversity 32 Figure 1.18 Plotting exact l-value and distinct l-diversity value in weka 33 Figure 1.19 Plotting exact l-value and entropy l-diversity value in weka 33

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.