Question

Clustering Blast overlapping alignments

0

Entering edit mode

9.9 years ago

chefarov ▴ 170

Hello all,

I have assembled a (C. Elegans) genome from raw dna-seq reads and I have come up with (repeat-masked) fasta file of scaffolds. I aligned a random EST seq onto the scaffolds using blast, thus I have a plain text or xml file with the alignments.

I want to go on following the "A beginner's guide to eukaryotic genome annotation" guide, by Mark Yandell and Daniel Ence, which mentions (about processing blast result):

... the remaining data are sometimes clustered to identify overlapping alignments and predictions. Clustering has two purposes. First, it groups diverse computational results into a single cluster of data, all supporting the same gene. Second, it identifies and purges redundant evidence; highly expressed genes, for example, may be supported by hundreds if not thousands of identical ESTs.

I can only image the two aforementioned cases as the same case. I mean getting multiple ESTs aligned onto a specific gene is overlapping results that could be clustered together. What else could the first case ( "diverse results all supporting the same gene" ) refer to? Isn't it the same thing?

gene-prediction blast sequence dna-seq alignment • 2.1k views

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 9.9 years ago by chefarov ▴ 170