Assemble Before Blast Or Blast Before Assemble?
2
1
Entering edit mode
8.2 years ago
biolab ★ 1.4k

Dear all, I posted a similar topic recently. I have a few draft genome data, all of which contain roughly 12 million 88 bp reads. I also have ~200 genes from reference organism, and need to find homologous sequences from the draft genomes. What's your ideas regarding to the order of assemble and BLAST? Assemble draft genomes and then perform BLAST or BLAST before assemble? Personally i prefere the latter approach for two reasons. Frist, the reads are 88 bp in lenght that is suitable for BLAST. Extracting the homologous reads is easy. Second, I only need to find 200 genes' homologs, so assembling genomes seems time-consuming. What's your ideas and what you suggest? Both methods are OK? THANKS A LOT!

blast • 1.8k views
2
Entering edit mode
8.2 years ago
cts ★ 1.7k

I would recommend performing assembly first. The assembly step will greatly reduce the complexity of the dataset since many of the reads will originate from the same region and can be collapsed into a single sequence, which will greatly increase the speed of the blasting step. Secondly although 88bp might be good enough for blast, the longer the sequence the better, so you'll be able to get more accurate blast results from the longer sequences.

0
Entering edit mode
8.2 years ago

Beside strategies you mention, I would probably try reference guided-asembly.

0
Entering edit mode

THANKS a lot! I just worry reference guided assembly may omit some true reads. All of my reads are 88 bp, and 8 bp mismatch still may be homologous. SOAP only allow 3 mismatch in reference based mapping. Can Bowtie do this? I am new in reads mapping. Hope your suggestions. THANKS A LOT!