how to identify artificially fused contigs during assembly by using blast?
1
1
Entering edit mode
7.6 years ago
seta ★ 1.7k

Hi all,

One of worse issues during transcriptome assembly is when two transcripts are fused together by their ends, in this case we will find two separate ORFs within the same contig with different BLAST hits (one blast hit for positive strand and another for negative strand). My question is how to identify these contigs within the assembly?, after that how to deal with such contigs, they have to keep or discard? your informative response would be highly appreciated in advance.

Assembly alignment RNA-Seq blast • 1.9k views
1
Entering edit mode
7.6 years ago
arnstrm ★ 1.8k

This paper talks about identifying such chimeric sequences from the transcriptome assembly. They also have a script that removes or corrects such sequences. Scripts can be found here.

From the paper:

Cis chimeras cannot be reliably detected when compared to sequences in a related species. Tandem duplication and rearrangement of gene segments can cause a false identification of cis-self chimera, and heterogeneity in base pair substitution rate within a gene can produce blastx hits similar to cis-multi-gene chimera. Trans chimeras, on the other hand, are much easier to detect from blastx results. In the majority of eukaryotic nuclear genomes, a transcript is unlikely to have two different ORFs of the opposite direction, especially if each of these ORFs is highly similar to known coding sequences, of sufficient length, and there is no substantial overlap between these ORFs.

0
Entering edit mode

Thanks friend, I try it. Just one issue, the authors set -max_target_seqs 100 in their blastx followed by chimer detection in the paper. As by setting this number, blastx take longer time to finish. this number (100) is important for chimer detection in your view?. I usually run blastx with -max_target_seqs 1 or 5. Please share me your opinion?

Thanks

0
Entering edit mode

I think it is pretty important to have large number of target matches. Because the way it detects chimera depends on the target alignment support (although 100 might be too many, and can be done with lesser target matches, I would definitely not use 1 target matches at all).