Question: Tool to detecte transcript with putative framshifts in the de novo assembled transcriptome
1
gravatar for seta
4.0 years ago by
seta1.1k
Sweden
seta1.1k wrote:

Hi everybody,

Do you agree with me on having as few as possible transcript with putative framshifts can be considered one of the quality factors during de novo transcriptome assembly? Could you please share your experience about this issue and introduce your way (tool) to detect them on assembled transcriptome? Any feedback warmly welcomed.

ADD COMMENTlink modified 3.9 years ago by Michael Dondrup45k • written 4.0 years ago by seta1.1k

Did you check my  suggestion to you previous question? What about an example? If you get comments or suggestions you should follow them up before asking approximately  the same question again.

 

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by Michael Dondrup45k

Yeah, I run blastx for one of the transcripts with putative frameshifts against nr database and check the first 50 hit, all of them were at frame -2, so there sounds no frameshifte, am I right? For this reason, I would like to check this issue using another tool, any suggestion?

ADD REPLYlink written 4.0 years ago by seta1.1k
2
gravatar for Michael Dondrup
4.0 years ago by
Bergen, Norway
Michael Dondrup45k wrote:

Looks like the tool you used has a lot false positives, I wouldn't trust it too much. Instead we could try to automate the blast method. Run blastx on all transcripts.

  • Retain all best hits or all hits with score > some threshold 
  • Retain all hits with more than one HSP
  • Retain all hits where at least one HSP has different frame from the others
  • All queries that pass these filters are frame shift candidates

This can be implemented easily using BioPerl or BioPython.

Here is an example in BioPerl that you can adjust for your needs.

Output:

 ./filterBlastFrameShift.pl ~/Downloads/JR132X8F11N-Alignment.txt 
lcl|Query_197237 has 6 hsps with the following frames: (*: e < 1e-10)
1*-3-3-2-23
Query gi|23274247|gb|BC035912.1| has 0 out of 1 hits with frameshifts
lcl|Query_197237 has 5 hsps with the following frames: (*: e < 1e-10)
3*-1-3-32
Query query_1_no_frameshift has 0 out of 1 hits with frameshifts
lcl|Query_197237 has 5 hsps with the following frames: (*: e < 1e-10)
3*-1-3-32
Query query_2_insert_no_frameshift has 0 out of 1 hits with frameshifts
lcl|Query_197237 has 6 hsps with the following frames: (*: e < 1e-10)
1*3*-1-32-1
frame mismatch 0 vs. 2
Query query_2_insert_with_frameshift has 1 out of 1 hits with frameshifts

 

And the blast example output with fabricated frame shift:

ADD COMMENTlink modified 3.9 years ago • written 4.0 years ago by Michael Dondrup45k

Yeah I agree with you on tool. Since doing blastx is really time consuming, I'm looking for another tool to retrieve some information instead of using blastx. However, many thanks for your suggestion, could you please share your Bioperl or Biopython script to evaluate them?  

ADD REPLYlink written 4.0 years ago by seta1.1k

I don't have such a script, it would be easy to write, given the spi documentation but not on the iPad, sorry you'll have to wait.

ADD REPLYlink written 4.0 years ago by Michael Dondrup45k

I have added an example for you to check.

ADD REPLYlink written 3.9 years ago by Michael Dondrup45k

BTW, as you are located in Sweden, you might have access to SweGrid or SweHPC to run the computations, see here http://www.snic.vr.se/, there is a similar infrastructure here in Norway (even though, I uually run blastX on a single server with 90 threads/40 cores, took ~1 week for 40k transcripts if I remember correctly)

ADD REPLYlink written 3.9 years ago by Michael Dondrup45k

Thanks so muxh for sharing your experience. I access to a server with 140 GB of RAM and 32 core, which concern me a bit about blastx. Is there any command to evaluate the required time to finish blastx job?  

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by seta1.1k

Many thanks for your script. Come back to the post and reply it is really kind of you

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by seta1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1023 users visited in the last hour