Clustering algorithms for disease classification using mass spectrometry data

Chandramohan, Vikram (2008) Clustering algorithms for disease classification using mass spectrometry data. Masters (Research) thesis, James Cook University.

PDF (Thesis front)
Download (229kB)
PDF (Thesis whole)
Download (950kB)


Besides the availability of genomic data, life-science researchers study proteomics in order to gain insight into the functions of cells by learning how proteins are expressed, processed, recycled, and their localization in cells. Proteomics are defined as the study of proteome which refers to the entire set of expressed proteins in a cell. In particular, functional proteomics involves the use of mass spectrometry (MS) to study the regulation, timing, and location of protein expression. It has been recently realized that the use of MS coupled with pattern recognition methodology can offer tremendous potential for the early detection of complex human diseases, and biomarker discovery. However, given the promising integration of several machine learning methods and MS data in high-throughput proteomics, this biotechnology field still encounters several challenges in order to become a mature platform for clinical diagnostics and protein-based biomarker profiling. Some of the major challenges include noise filtering of MS data, feature extraction, feature reduction of MS datasets and selection of computational methods for MS-based classification. The main objective of this research is to classify diseases using MS data. First, we investigated feature extraction of MS data based on the fundamentals of signal processing such as the theory of linear predictive coding. Then we present an unsupervised kernel based fuzzy c-means (KFCM) approach, which is shown to be more robust to noise than fuzzy c-means (FCM) for mass spectrometry dataset. The KFCM is realized by modifying the original Euclidean distance in FCM by a kernel-induced distance. We evaluated the performance of our classification methods with some popular classification techniques such as support vector machine (SVM), principle component analysis (PCA), linear or quadratic discriminate analysis (LDA/QDA) and random forests.

Item ID: 2122
Item Type: Thesis (Masters (Research))
Keywords: mass spectrometry data, medical diagnostic technology, clustering algorithms, proteomics, mass spectrometric protein identification, prostate specific antigen datasets
Date Deposited: 23 Mar 2009 22:55
FoR Codes: 10 TECHNOLOGY > 1004 Medical Biotechnology @ 0%
11 MEDICAL AND HEALTH SCIENCES > 1101 Medical Biochemistry and Metabolomics > 110106 Medical Biochemistry: Proteins and Peptides (incl Medical Proteomics) @ 0%
Downloads: Total: 746
Last 12 Months: 4
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page