Enhancement and validation of current human genome annotation via novel proteogenomics algorithms

Has, Canan

Please use this identifier to cite or link to this item: https://hdl.handle.net/11147/6620

Title:	Enhancement and validation of current human genome annotation via novel proteogenomics algorithms
Other Titles:	Var olan insan genom anotasyonunun yeni proteogenomik yöntemler ile doğrulanması ve geliştirilmesi
Authors:	Has, Canan
Advisors:	Allmer, Jens
Keywords:	Proteogenomics Human genome Genomic DNA
Publisher:	Izmir Institute of Technology
Source:	Has, C. (2016). Enhancement and validation of current human genome annotation via novel proteogenomics algorithms. Unpublished doctoral dissertation, İzmir Institute of Technology, İzmir, Turkey
Abstract:	Proteogenomics includes the transfer of knowledge from proteomics to genomics and vice versa. To have high confidence in the information transferred it is essential that it be based on experimental results. Genomics is currently fueled by high throughput techniques involving next generation sequencing. Proteomics is based on mass spectrometry (MS) which is also a high throughput approach. Both fields are generating a wealth of data which needs to be correlated and annotated to generate knowledge. Publicly available human blood plasma mass spectrometric data exist for samples in data repositories such as PeptideAtlas, PRIDE. We acquired high-quality collections from this data and stored it in a custom database developed by us. First, we aimed to amend this data by employing a proteogenomic pipeline PGMiner developed in this study against a custom sequence database which includes all predicted alternative open reading frames as well as the six-frame translation of the human genome and exosome. Then, we correlated the existing annotations with the available mass spectrometric measurements. The human genome in tandem with currently available genome annotations from HAVANA and ENSEMBL enabled us to validate and enhance current gene annotations. Proteogenomik protemikten genomik alanına veya genomikten proteomik alanına bilginin transferini içerir. İki alanda bilgi üretmek için kimliklendirilmesi ve ilişkilendirilmesi gereken büyük sayıda veri ortaya koyar. Genomik çalışmalarla üretilen verilerin kimliklendirilmesi amaçlanır ve bu kimliklendirmede yüksek güvenilirlik elde etmek için deneysel tekniklerle translasyon düzeyinde doğrulama yapılması şarttır. Genomik yeni nesil dizileme yöntemini içeren yüksek-ölçekli yöntemlerle elde edilirken, proteomik verileri yine yüksek-ölçekli veri üreten bir yöntem olan kütle spektrometreden elde edilir. PeptideAtlas, PRIDE gibi çeşitli veri bankalarında açık kaynak insan kan plazma dokusuna ait kütle spektrometre verisi mevcuttur. Bu veriler arasından elde edilecek yüksek kaliteli koleksiyonlar geliştireceğimiz veritabanında depolanmıştır. Bu proje kapsamında ilk gerçekleştirilen amaç spektral verileri bu çalışma kapsamında geliştirilen PGMiner akış algoritması kullanarak insan genomunun 6-çerçeve translasyonu, eksozom ve tüm tahmin edilmiş alternatif açık okuma çerçevelerini kapsayan veritabanlarına karşı aranmış ve spektral verilerin hangi peptitlere ait olduğunu anlamlandırılmıştır. Daha sonra var olan gen ve protein anotasyonları ile ilk aşamada peptit tanımlaması yapılan kütle spektrometre ölçümleri ilişkilendirilmiştir. HAVANA ve ENSEMBL’dan elde edilen var olan genom anotasyonları ile mevcut gen anotasyonları doğrulanmış ve geliştirilmiştir.
Description:	Thesis (Doctoral)--Izmir Institute of Technology, Molecular Biology and Genetics, Izmir, 2017 Full text release delayed at author's request until 2020.05.09 Includes bibliographical references (leaves: 93-110) Text in English; Abstract: Turkish and English
URI:	http://hdl.handle.net/11147/6620
Appears in Collections:	Phd Degree / Doktora