Gene Function Prediction Based on Sequence or Expression Data (2011)

by Kevin Thomas Horan

Abstract: One of the primary goals of bioinformatics is the identification of the function of genes. The most reliable way of doing this is through experimentation. However, this is a very slow and expensive process. While this is necessary in the beginning and will continue to be necessary for special cases, it becomes impractical when one considers the number of different genes encoded in the genomes of every living organism. A faster way is to instead identify the function of genes by comparing them to the smaller set of genes with known function. This comparison may be based on many different kinds of data, including sequence similarity and gene expression data (Hawkins and Kihara, 2007).

The goal of this dissertation is to provide tools to aid in the identification of the function of unknown genes. To that end, we first present a study in which gene expression data was used to annotate many unknown genes by clustering the expression data. We then present a tool for clustering gene expression data while also identifying short areas of high sequence similarity (motifs) among members of the clusters. Finally, we present a tool for identifying the functionally relevant sub-sections of protein sequences. These sub-sections can then be used to find proteins containing similar sub-sections, even though the rest of the protein may be quite different. This tool can thus find more distantly related proteins sharing functionally relevant features.


Download Information

Kevin Thomas Horan (2011). Gene Function Prediction Based on Sequence or Expression Data. Doctoral dissertation, University of California at Riverside. pdf        

Bibtex citation

@phdthesis{Hor11,
   author = "Kevin Thomas Horan",
   title = "Gene Function Prediction Based on Sequence or Expression Data",
   school = "University of California at Riverside",
   schoolabbr = "UC Riverside",
   year = 2011,
   month = Dec,
}

full list