Machine Learning over Encrypted Data With Fully Homomorphic Encyption

Download
2022-8-26
Kahya, Ayşegül
When machine learning algorithms train on a large data set, the result will be more realistic. Big data, distribution of big data, and the study of learning algorithms on distributed data are popular research topics of today. Encryption is a basic need, especially when storing data with a high degree of confidentiality, such as medical data. Classical encryption methods cannot meet this need because when texts encrypted with classical encryption methods are distributed, and the distributed data set is decrypted using the same key, the result is corrupted. Homomorphic encryption methods are suitable for this job because they allow polynomial operations on the ciphertext. The encryption key distribution to all these devices is a problem here, and the reliability of these devices' access to highly confidential plain text needs to be evaluated. Since homomorphic encryption allows polynomial operations and some machine learning algorithms are polynomial-based, training machine learning algorithms directly on the ciphertext could be a solution. Logistic regression is one of the polynomial-based machine learning algorithms. In this thesis, a logistic regression algorithm is trained on a data set containing various patient information. It was seen that the algorithm predicted with a 77.2 percent success rate whether people would be diagnosed with diabetes within five years or not. Afterward, the data set was encrypted using the CKKS fully homomorphic encryption method. While working logistic regression over the encrypted dataset, it is needed to use some approximations on the algorithm. We wanted to see the results of the applied approximation without any encryption first. And the learning algorithm was run again on the encrypted data set. And the successful prediction rate was 76.8. After the encryption, the algorithm predicted whether people would be diagnosed with diabetes in five years, and the correct prediction rate was 76.8 percent again. This showed us that machine learning with approximated logistic regression method could be run directly on a data set encrypted using the homomorphic encryption method.

Suggestions

On numerical optimization theory of infinite kernel learning
Ozogur-Akyuz, S.; Weber, Gerhard Wilhelm (2010-10-01)
In Machine Learning algorithms, one of the crucial issues is the representation of the data. As the given data source become heterogeneous and the data are large-scale, multiple kernel methods help to classify "nonlinear data". Nevertheless, the finite combinations of kernels are limited up to a finite choice. In order to overcome this discrepancy, a novel method of "infinite" kernel combinations is proposed with the help of infinite and semi-infinite programming regarding all elements in kernel space. Look...
MODELLING OF KERNEL MACHINES BY INFINITE AND SEMI-INFINITE PROGRAMMING
Ozogur-Akyuz, S.; Weber, Gerhard Wilhelm (2009-06-03)
In Machine Learning (ML) algorithms, one of the crucial issues is the representation of the data. As the data become heterogeneous and large-scale, single kernel methods become insufficient to classify nonlinear data. The finite combinations of kernels are limited up to a finite choice. In order to overcome this discrepancy, we propose a novel method of "infinite" kernel combinations for learning problems with the help of infinite and semi-infinite programming regarding all elements in kernel space. Looking...
On Equivalence Relationships Between Classification and Ranking Algorithms
Ertekin Bolelli, Şeyda (2011-10-01)
We demonstrate that there are machine learning algorithms that can achieve success for two separate tasks simultaneously, namely the tasks of classification and bipartite ranking. This means that advantages gained from solving one task can be carried over to the other task, such as the ability to obtain conditional density estimates, and an order-of-magnitude reduction in computational time for training the algorithm. It also means that some algorithms are robust to the choice of evaluation metric used; the...
Domain adaptation on graphs by learning graph topologies: theoretical analysis and an algorithm
Vural, Elif (The Scientific and Technological Research Council of Turkey, 2019-01-01)
Traditional machine learning algorithms assume that the training and test data have the same distribution, while this assumption does not necessarily hold in real applications. Domain adaptation methods take into account the deviations in data distribution. In this work, we study the problem of domain adaptation on graphs. We consider a source graph and a target graph constructed with samples drawn from data manifolds. We study the problem of estimating the unknown class labels on the target graph using the...
Parallel computing in linear mixed models
Gökalp Yavuz, Fulya (Springer Science and Business Media LLC, 2020-09-01)
In this study, we propose a parallel programming method for linear mixed models (LMM) generated from big data. A commonly used algorithm, expectation maximization (EM), is preferred for its use of maximum likelihood estimations, as the estimations are stable and simple. However, EM has a high computation cost. In our proposed method, we use a divide and recombine to split the data into smaller subsets, running the algorithm steps in parallel on multiple local cores and combining the results. The proposed me...
Citation Formats
A. Kahya, “Machine Learning over Encrypted Data With Fully Homomorphic Encyption,” M.S. - Master of Science, Middle East Technical University, 2022.