Does having ORF shows the coding potential of a transcript that showed no hit against NCBI nr ?
0
0
Entering edit mode
7.5 years ago
Farbod ★ 3.4k

Dear Biostars experts, Hi ( I am not native in English )

I have about 150 transcripts that are important for me (they are some of my present in case BUT absent in control DEGs, I have collect them from Trinity de novo assembly of a non-model vertebrate and performed DEG analysis via DESeq2, I hope to trap them using PCR) that showed no hit in blastx against NCBI nr.

I tried to understand that if they are coding ( in this case = maybe novel genes?) or not. So I have used some different tools and the result of coding transcripts among these 150 is :

1- Coding Potential Assessment Tool ------ > 2 of them (against zebrafish)

2- PLEK :-----------------------------------------------> 6 of them (completly different from CPAT results, are they long non-coding RNAs ?)

3- Transdecoder : -----------------------------------> 12 of them (running Transdecoder.LongOrfs and after that, TransDecoder.Predict)

My questions :

1- how can we figure out that if a transcripts did not show any hit in blastx against nr, IS a new gene or just assembly/sequencing error ?

2- Why is that (different result of 2, 6 and 12) and which one is acceptable ?

3- which of them are more suitable for primer design and trap with normal PCR (between case and control)

~ Best wishes

blast gene ORF CDS alignment • 2.4k views
ADD COMMENT
0
Entering edit mode

Results from predictive analyses are "hypotheses". They have equal probability of being real until proven otherwise by experimental means. So all may turn out to be real/important or ...

If your budget/experimental stamina does not allow you to tackle all of them upfront, then start with ones that are common (are there any, if not pick one from three sets) and work your way through the rest. I am glad to see that you have reduced them from thousands to a realistic number.

ADD REPLY
0
Entering edit mode

Dear Genomax2 hi, and thank you.

Can you suggest any pipeline for analysis of a transcript without any blast to see what is it ?

e.g: checking its CDS or ORF (any suggested software?) -> ExPASy -> aligning to a refernce genome (using bowtie or STAR-

any suggested script ?) . . . and so on ?

Take care

ADD REPLY
2
Entering edit mode

No matter what pipeline/tool you use, you will always end up with a "prediction". Until you (or someone else) proves the existence of that transcript/protein by an experiment in your fish you won't be able to design downstream experiments to unlock the mysteries of that gene/protein.

There are two ways of doing this. One is by similarity searches (e.g. blast) which you have been doing already. Other is by building a protein model (assuming protein sequence is correct). You could then do a structure (3D) comparison. The second option is harder to get right and will require different kind of expertise/programs.

At the end of the day you would still require experimental verification of the results.

ADD REPLY
0
Entering edit mode

Dear genomax, thank you for your complete answer, I really appreciate that!

1- by "require experimental verification" do you mean the PCR that I have mentioned ?

2- I have heard that mapping/aligning the transcript sequence to the NCBI nr database (using bowtie or other similar tools) will offer a more accurate answer than BLAST, are agree ?

3- Does finding ORF in a sequence (e.g using Transdecoder) nessaseryly shows that it is mRNA of a gene ?

4- if some of the transcripts belong to long non-coding RNAs (as PLEK suggests), can we trace them using normal PCR ?

~ Best

ADD REPLY
2
Entering edit mode
  1. That would be one way.

  2. Mapping would not provide a more accurate answer (see slide #11 in Heng Li's presentation for definitions) than blast. Creating an NGS aligner index for nr would not be a trivial task.

  3. Not necessarily. I can give you a synthetic sequence that transdecoder can find an ORF in. Does not mean it is real.

  4. Someone else should answer that :)

@Michael had suggested using Exonerate to someone else in this thread: A: blast to find gene from an organism based on other organism If you have not tried that before.

ADD REPLY
0
Entering edit mode

Dear Genomax2, Hi

As I have told you before, I have used 3 different programs (mentioned above) to examine that if my hit-less transcripts are coding (even theoretically) or junk ? but when I compare the result of these 3 in a venn diagram, there is not any overlap/similarity among them!

and I can't design and run PCR for all of them.

So, I think I must ask another question from you :

What is a standard tool/software to predict the coding ability of a transcript (in the lack of reference genome) ?

~Thanks

ADD REPLY
0
Entering edit mode

All the programs you have tried use some knowledge to determine if a transcript has the potential to code a protein. They try to do the best they can (within the limits of what they have been programmed to do) but clearly the results are not optimal/overlapping (surprised to see that there is NO overlap).

You need to bite the bullet and start with 2 or 3 predicted transcripts (use your gut feeling, throw a dice, generate a random number using a computer), design primers and see if you can amplify a product (use the same sample/conditions that were used when making the libraries). That should give you some indication if it would be worth investigating this data further.

ADD REPLY
0
Entering edit mode

Dear Genomax2. I have heard from both creator of Transdecoder and PELK (via email) that the number of sequences that are being investigated have dramatic effect on results (maybe because of Markov model)

Do you have any idea in this regard?.

ADD REPLY

Login before adding your answer.

Traffic: 2828 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6