Improving Low-Budget Semi-Supervised Approaches for Model Extraction Attacks

Genç, Didem

Improving Low-Budget Semi-Supervised Approaches for Model Extraction Attacks

dc.contributor.advisor	Baştanlar, Yalın
dc.contributor.advisor	Tomur, Emrah
dc.contributor.author	Genç, Didem
dc.contributor.other	03.04. Department of Computer Engineering
dc.contributor.other	03. Faculty of Engineering
dc.contributor.other	01. Izmir Institute of Technology
dc.date.accessioned	2025-06-25T20:50:42Z
dc.date.available	2025-06-25T20:50:42Z
dc.date.issued	2024
dc.description	Thesis (Doctoral)--İzmir Institute of Technology, Computer Engineering, Izmir, 2024	en_US
dc.description	Includes bibliographical references (leaves. 51-54)	en_US
dc.description	Text in English; Abstract: Turkish and English	en_US
dc.description.abstract	Makine öğrenimi (ML) modelleri, etkinlikleri nedeniyle birçok alanda yaygın olarak kullanılmaktadır; ancak yüksek doğruluğa sahip modelleri eğitmenin maliyeti de yüksektik. Bu bağlamda, MLaaS (Machine Learning as a Service) platformları, API'ler aracılığıyla erişilebilen bulut tabanlı kara kutu modeller sunarak, model çalma saldırıları gibi güvenlik sorunlarını gündeme getirmektedir. Model çalma saldırıları, bulutta konuşlandırılmış bir makine öğrenimi modelini yalnızca kara kutu sorgulamalarıyla kopyalamayı amaçlamaktadır. Bu tez çalışmasında, etiketlenmemiş veriye erişimin kolay olduğu ancak etiketli verinin maliyetli olduğu senaryolarda, maliyet etkin ve yüksek doğruluklu bir model çalma saldırısı geliştirilmiştir. Literatürde sentetik veri setleri oluşturma, doğal veri setlerinden aktif öğrenme ile veri seçme ve yarı denetimli öğrenme gibi stratejiler önerilmektedir. Bu çalışmada ise, API üzerindeki kara kutu bir modele saldırmak için öz-denetimli öğrenen modellerden faydanılması önerilmiştir. Bu yöntemde, saldırganın geniş bir etiketlenmemiş veri havuzuna erişimi olduğu varsayılmakta ve bu veri, öz-denetimli SimCLR modelini eğitmek için kullanılmaktadır. Etiketsiz veri kümesinden belirli bir alt küme seçilir ve hedef modele sorgular gönderilerek bu veriler etiketlenir. Bu işlem sonucunda transfer veri seti oluşturulur. İlk ikame model, transfer veri setiyle SimCLR encoder'ına eklenen bir çok katmanlı algılayıcı (MLP)'nın ince ayar yapılarak eğitilmesi ile elde edilir. İkame modelin doğruluğunu artırmak için kalan etiketlenmemiş verilere otomatik etiketleme uygulanır; yüksek güvenli çıktılar doğrudan etiket olarak kullanılırken, düşük güvenli çıktılar hedef modelin etiketlediği örneklerle olan benzerliğe göre etiketlenir. Bu süreç, modelin karmaşık örüntüleri öğrenmesini ve veri çeşitliliğini artırmasını sağlayarak ikame modelin doğruluğunu hedef modele yaklaştıracak şekilde artırır. Önerilen methodun verimliliği CIFAR10 ve SVHN datasetleri üzerinde deneyler yapılarak verilmiştir.
dc.description.abstract	Machine learning (ML) models are widely adopted across numerous fields due to their effectiveness; however, training high-accuracy models often involves substantial costs. To address this, Machine Learning as a Service (MLaaS) platforms provide cloud-based, black-box models accessible through APIs (Application Programming Interface), which raises security concerns like model extraction attacks (MEA). An MEA seeks to replicate a cloud-deployed ML model solely using black-box queries. This thesis proposes a cost-effective and accurate model extraction attack where unlabeled data is readily available, but labeled data is costly. Existing literature suggests strategies such as creating synthetic datasets, selecting data via active learning, and using semi-supervised learning. This thesis instead adopts a self-supervised learning approach for attacking a black-box model via an API. The method assumes the adversary access to a large pool of unlabeled data, which is used to train a self-supervised SimCLR model. A subset of the unlabeled data is queried through the target model to create a transfer dataset, which fine-tunes a multi-layer perceptron (MLP) added to the SimCLR encoder, forming the baseline substitute model. To enhance the substitute model accuracy, automatic labeling assigns high-confidence predictions directly as labels to the unlabeled data, while low-confidence samples are labeled based on similarity to target-labeled data. Incorporating high-entropy data during training enables the model to capture complex patterns and increase data diversity, ultimately enhancing the substitute model's accuracy. The method's effectiveness is demonstrated through experiments on CIFAR-10 and SVHN datasets.	en_US
dc.format.extent	x, 54 leaves	en_US
dc.identifier.uri	https://tez.yok.gov.tr/UlusalTezMerkezi/TezGoster?key=htlyhJG97gjBTPjAeWRhPiNfwUuxE704uy3w5OVaiZ3q7aEjEIUpSuY78Pj3F13s
dc.identifier.uri	https://hdl.handle.net/11147/15647
dc.language.iso	en
dc.publisher	01. Izmir Institute of Technology	en_US
dc.subject	Machine learning	en_US
dc.subject	Model extraction attacks	en_US
dc.title	Improving Low-Budget Semi-Supervised Approaches for Model Extraction Attacks	en_US
dc.title.alternative	Model Çıkarma Saldırıları için Düşük Bütçeli Yarı-Denetimli Yaklaşımların İyileştirilmesi
dc.type	Doctoral Thesis	en_US
dspace.entity.type	Publication
gdc.author.institutional	Baştanlar, Yalın
gdc.author.institutional	Tomur, Emrah
gdc.description.department	Thesis (Doctoral)--İzmir Institute of Technology, Computer Engineering	en_US
gdc.description.endpage	65
gdc.description.publicationcategory	Tez
gdc.identifier.yoktezid	930964
relation.isAuthorOfPublication	7f75e80a-0468-490d-ba2e-498de80b7217
relation.isAuthorOfPublication	54ed318d-1200-41b4-8648-e742a541fc54
relation.isAuthorOfPublication.latestForDiscovery	7f75e80a-0468-490d-ba2e-498de80b7217
relation.isOrgUnitOfPublication	9af2b05f-28ac-4014-8abe-a4dfe192da5e
relation.isOrgUnitOfPublication	9af2b05f-28ac-4004-8abe-a4dfe192da5e
relation.isOrgUnitOfPublication	9af2b05f-28ac-4003-8abe-a4dfe192da5e
relation.isOrgUnitOfPublication.latestForDiscovery	9af2b05f-28ac-4014-8abe-a4dfe192da5e

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 15647.pdf
Size:: 1.89 MB
Format:: Adobe Portable Document Format
Description:: Doctoral Thesis

Download

Collections

Phd Degree / Doktora