Question: how to identify artificially fused contigs during assembly by using blast?
1
gravatar for seta
4.4 years ago by
seta1.2k
Sweden
seta1.2k wrote:

Hi all,

One of worse issues during transcriptome assembly is when two transcripts are fused together by their ends, in this case we will find two separate ORFs within the same contig with different BLAST hits (one blast hit for positive strand and another for negative strand). My question is how to identify these contigs within the assembly?, after that how to deal with such contigs, they have to keep or discard? your informative response would be highly appreciated in advance.

blast rna-seq alignment assembly • 1.3k views
ADD COMMENTlink modified 4.4 years ago by arnstrm1.7k • written 4.4 years ago by seta1.2k
1
gravatar for arnstrm
4.4 years ago by
arnstrm1.7k
Ames, IA
arnstrm1.7k wrote:

This paper talks about identifying such chimeric sequences from the transcriptome assembly. They also have a script that removes or corrects such sequences. Scripts can be found here.

From the paper:

"Cis chimeras cannot be reliably detected when compared to sequences in a related species. Tandem duplication and rearrangement of gene segments can cause a false identification of cis-self chimera, and heterogeneity in base pair substitution rate within a gene can produce blastx hits similar to cis-multi-gene chimera. Trans chimeras, on the other hand, are much easier to detect from blastx results. In the majority of eukaryotic nuclear genomes, a transcript is unlikely to have two different ORFs of the opposite direction, especially if each of these ORFs is highly similar to known coding sequences, of sufficient length, and there is no substantial overlap between these ORFs."

 

ADD COMMENTlink written 4.4 years ago by arnstrm1.7k

Thanks friend, I try it. Just one issue, the authors set -max_target_seqs 100 in their blastx followed by chimer detection in the paper. As by setting this number, blastx take longer time to finish. this number (100) is important for chimer detection in your view?. I usually run blastx with -max_target_seqs 1 or 5. Please share me your opinion?

thanks

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by seta1.2k

I think it is pretty important to have large number of target matches. Because the way it detects chimera depends on the target alignment support (although 100 might be too many, and can be done with lesser target matches, I would definitely not use 1 target matches at all).

ADD REPLYlink written 4.4 years ago by arnstrm1.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 885 users visited in the last hour