Question: Trinity predicting more number of genes?
0
gravatar for kanika.151
4.0 years ago by
kanika.15180
Italy
kanika.15180 wrote:


################################
## Counts of transcripts, etc.
################################
Total trinity 'genes':    35868
Total trinity transcripts:    54969
Percent GC: 51.52

########################################
Stats based on ALL transcript contigs:
########################################

    Contig N10: 9567
    Contig N20: 7769
    Contig N30: 6524
    Contig N40: 5393
    Contig N50: 4511

    Median contig length: 1780
    Average contig: 2555.95
    Total assembled bases: 140497949


#####################################################
## Stats based on ONLY LONGEST ISOFORM per 'GENE':
#####################################################

    Contig N10: 7964
    Contig N20: 6149
    Contig N30: 5018
    Contig N40: 4126
    Contig N50: 3411

    Median contig length: 1077
    Average contig: 1843.53
    Total assembled bases: 66123622

 

This is what trinityStats.pl gives me after the assembly...

The total number of genes I was expecting were 12,510 but it is giving me 35,868 when I remove the isoforms it is still giving me 25,747 genes. Why is it giving me extra 13k genes?

Has anyone else stumbled on this trinity problem?

trinity rna-seq genes latest • 1.6k views
ADD COMMENTlink modified 3.9 years ago • written 4.0 years ago by kanika.15180
2

I have gotten the same thing after Trinity. I isolated the longest sequence from output then used them for downstream analysis.

ADD REPLYlink written 4.0 years ago by Mehmet500

How did you do isolate the longest sequence?

ADD REPLYlink written 3.9 years ago by kanika.15180
1

Trinity author Brian Haas has provided a perl script to extract longest isoforms from Trinity assemblies - alongside with this comment:

"The longest transcript isn't always the 'best' transcript....  but this has been asked for so many times, I'll just write the script and post it shortly."
ADD REPLYlink written 3.9 years ago by h.mon28k

Initially, I thought that I have not used the "--trimmomatic" or "--normalize_reads" parameters maybe thats why I was getting such a estimate and when I ran it again I am getting even more Trinity Transcripts. I think I will run the analysis for both longest transcripts and all of them. Thank you.

ADD REPLYlink written 3.9 years ago by kanika.15180

we used a custom perl script. As it was mentioned on Trinity Frequently Asked Questions, you can use all transcripts for your downstream analysis. That is also reasonable.

ADD REPLYlink written 3.9 years ago by Mehmet500

Hello kanika.151!

Questions similar to yours can already be found at:

We have closed your question to allow us to keep similar content in the same thread.

If you disagree with this please tell us why in a reply below. We'll be happy to talk about it.

Cheers! 

> Re-opened because it wasn't exactly identical.

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by Dr. Mabuse47k

I think that is quite normal, most or all transcriptome assemblies will largely overestimate the number of transcripts, because of gaps. A factor or 2-3 is quite good I think. Why don't you map the reads to the genome instead and check for novel transcripts that way?

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by Dr. Mabuse47k

If I do a genome based trinity how would it give me Novel transcripts?

ADD REPLYlink written 3.9 years ago by kanika.15180

Why does it overpredicts? How can I explain it? 

For the assembly, I had used 3 biological replicates so 3 times and I got 3 times the known genes that made me wonder was it really assembling the reads?

ADD REPLYlink written 3.9 years ago by kanika.15180
0
gravatar for kanika.151
3.9 years ago by
kanika.15180
Italy
kanika.15180 wrote:

Okay, I got why it is over-estimating and how I can remove similar clusters. 

While assembling I added the control and inoculated together which should have been done separately. Also, there is an algorithm called CD-HIT which helps in removing similar clusters to give out the needed assembly.

ADD COMMENTlink written 3.9 years ago by kanika.15180
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1702 users visited in the last hour