ISSN: 0256-1115 (print version) ISSN: 1975-7220 (electronic version)
Copyright © 2025 KICHE. All rights reserved

Articles & Issues

Language
English
Conflict of Interest
In relation to this article, we declare that there is no conflict of interest.
Publication history
Received January 17, 2005
Accepted March 16, 2005
articles This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/bync/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright © KIChE. All rights reserved.

All issues

A Gene Clustering Method with Masking Cross-matching Fragments Using Modified Suffix Tree Clustering Method

Department of Chemical Engineering, Pusan National University, Busan 609-735, Korea 1Department of Chemical Engineering, Dong-A University, Busan 604-714, Korea
kshwang@pusan.ac.kr
Korean Journal of Chemical Engineering, May 2005, 22(3), 345-352(8), 10.1007/BF02719409
downloadDownload PDF

Abstract

Multiple sequence alignment is a method for comparing two or more DNA or protein sequences. Most multiple sequence alignment methods rely on pairwise alignment and Smith-Waterman algorithm [Needleman and Wunsch, 1970; Smith and Waterman, 1981] to generate an alignment hierarchy. Therefore, as the number of sequences increases, the runtime increases exponentially. To resolve this problem, this paper presents a multiple sequence alignment method using a parallel processing suffix tree algorithm to search for common subsequences at one time without pairwise alignment. The cross-matched subsequences among the searched common subsequences may be generated and those cause inexact-matching. So the procedure of masking cross-matching pairs was suggested in this study. The proposed method, improved STC (Suffix Tree Clustering), is summarized as follows: (1) construction of suffix tree; (2) search and overlap of common subsequences; (3) grouping of subsequence pairs; (4) masking of cross-matching pairs; and (5) clustering of gene sequences. The new method was successfully evaluated with 23 genes in Mus musculus and 22 genes in three species, clustering nine and eight clusters, respectively.

References

Chen JY, Carlis JV, Inf. Syst., 28, 287 (2003) 
Choi SH, Manousiouthakis V, Korean J. Chem. Eng., 19(2), 227 (2002)
Delcher AL, Kasif S, Fleischmann RD, Peterson J, White O, Salzberg SL, Nucleic Acids Res., 27(11), 2369 (1999) 
Delcher AL, Phillippy A, Carlton J, Salzberg SL, Nucleic Acids Res., 30(11), 2478 (2002) 
Gusfield D, Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press, Cambridge, London (1997)
Higgins DG, Thompson JD, Gibson TJ, Methods Enzymol., 266, 383 (1996)
Higgins DG, Sharp PM, Gene, 73, 237 (1988) 
Hon WK, Sadakane K, "Space-Economical Algorithms for Finding Maximal Unique Matches", Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching, 144 (2002)
Kalyanaraman A, Aluru S, Kothari S, Parallel EST Clustering, HICOMB 2002, 185 (2002)
Kim DK, Lee KS, Yang DR, Korean J. Chem. Eng., 21(5), 942 (2004)
Lee JM, Lee JH, Korean J. Chem. Eng., 21(2), 338 (2004)
McCreight E, J. ACM, 23, 262 (1976) 
Miller RT, Christoffels AG, Gopalakrishnan C, Burke J, Ptitsyn AA, Broveak TR, Hide WA, Genome Res., 9, 1143 (1999) 
Morgenstem B, Frech K, Dress A, Wemer T, Bioinformatics, 14, 290 (1998) 
Mount DW, Bioinformatics : Sequence and Genome Analysis, Cold Spring Harbor Laboratory Press (2001)
Needleman SB, Wunsch CD, J. Mol. Biol., 48, 443 (1970) 
Notredame C, Higgins DG, Nucleic Acids Res., 24, 1515 (1996) 
Ostell JM, Wheelan SJ, Kans JA, Methods Biochem. Anal., 43, 19 (2001)
Pearson WR, Miller W, Methods Enzymol., 210, 575 (1992)
Phillips A, Janies D, Wheeler W, Molecular Phylogenetics and Evolution, 16, 317 (2000) 
Randal LS, Christiansen T, Learning Perl, Second Edition, O'Reilly (1997)
Salzberg SL, Searls DB, Kasif S, Trends Guide to Bioinformatics, Elsevier Science (1998)
Shin PK, Koo JH, Lee WJ, Seo JH, Korean J. Chem. Eng., 13(1), 82 (1996)
Smith TF, Waterman MS, J. Mol. Biol., 197, 723 (1981)
Thompson JD, Higgins DG, Gibson TJ, Nucleic Acids Res., 22, 4673 (1994)
Tisdall JD, Beginning Perl for Bioinformatics, O'REILLY (2001)
Ukkonen E, Algorithmica, 14, 249 (1995) 
Volfovsky N, Haas BJ, Salzberg SL, Genome Biology, 2, 1 (2001)
Weiner P, "Linear Pattern Matching Algorithms", In Proc. of the 14th IEEE Annual Symposium on Switching and Automata Theory, 1 (1973)
Zamir O, Etzioni O, Madani O, Karp RM, "Fast and Intuitive Clustering of Web Documents", In Proc. of the 3nd International Conference on Knowledge Discovery and Data Mining, 287 (1997)
Zamir O , Etzioni O, "Web Document Clustering: A Feasibility Demonstration", In Proc. of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 46 (1998)

The Korean Institute of Chemical Engineers. F5, 119, Anam-ro, Seongbuk-gu, 233 Spring Street Seoul 02856, South Korea.
TEL. No. +82-2-458-3078FAX No. +82-507-804-0669E-mail : kiche@kiche.or.kr

Copyright (C) KICHE.all rights reserved.

- Korean Journal of Chemical Engineering 상단으로