Jump to Navigation

CAST: an iterative algorithm for the complexity analysis of sequence tracts.

Publication Type:

Journal Article

Source:

Bioinformatics (Oxford, England), Volume 16, Number 10, p.915-922 (2000)

ISBN:

1367-4803; 1367-4803

Keywords:

Algorithms, Animals, Databases, DNA, DNA/methods, Factual, Genes, Open Reading Frames, Plasmodium falciparum/genetics, Protozoan, Protozoan/chemistry/genetics, Sequence Analysis

Abstract:

MOTIVATION: Sensitive detection and masking of low-complexity regions in protein sequences. Filtered sequences can be used in sequence comparison without the risk of matching compositionally biased regions. The main advantage of the method over similar approaches is the selective masking of single residue types without affecting other, possibly important, regions. RESULTS: A novel algorithm for low-complexity region detection and selective masking. The algorithm is based on multiple-pass Smith-Waterman comparison of the query sequence against twenty homopolymers with infinite gap penalties. The output of the algorithm is both the masked query sequence for further analysis, e.g. database searches, as well as the regions of low complexity. The detection of low-complexity regions is highly specific for single residue types. It is shown that this approach is sufficient for masking database query sequences without generating false positives. The algorithm is benchmarked against widely available algorithms using the 210 genes of Plasmodium falciparum chromosome 2, a dataset known to contain a large number of low-complexity regions. AVAILABILITY: CAST (version 1.0) executable binaries are available to academic users free of charge under license. Web site entry point, server and additional material: http://www.ebi.ac.uk/research/cgg/services/cast/

Notes:

LR: 20061115; JID: 9808944; 0 (DNA, Protozoan); ppublish



by Dr. Radut.