Clustering Blast overlapping alignments
0
0
Entering edit mode
8.7 years ago
chefarov ▴ 170

Hello all,

I have assembled a (C. Elegans) genome from raw dna-seq reads and I have come up with (repeat-masked) fasta file of scaffolds. I aligned a random EST seq onto the scaffolds using blast, thus I have a plain text or xml file with the alignments.

I want to go on following the "A beginner's guide to eukaryotic genome annotation" guide, by Mark Yandell and Daniel Ence, which mentions (about processing blast result):

... the remaining data are sometimes clustered to identify overlapping alignments and predictions. Clustering has two purposes. First, it groups diverse computational results into a single cluster of data, all supporting the same gene. Second, it identifies and purges redundant evidence; highly expressed genes, for example, may be supported by hundreds if not thousands of identical ESTs.

I can only image the two aforementioned cases as the same case. I mean getting multiple ESTs aligned onto a specific gene is overlapping results that could be clustered together. What else could the first case ( "diverse results all supporting the same gene" ) refer to? Isn't it the same thing?

gene-prediction blast sequence dna-seq alignment • 1.9k views
ADD COMMENT

Login before adding your answer.

Traffic: 1891 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6