Question

Help with salmon quantification of samples with de novo transcriptome

0

Entering edit mode

3.0 years ago

codyas • 0

Hello everyone,

I am trying to do get my data into a workable data frame so that I can ultimately run DESeq2 at the gene level. I have started out with my sequencing samples and ran trinity for denovo assembly, which gives a fasta file. I then run salmon to get transcript level abundances against that fasta file and this outputs a quant.sf file per sample. Next I am trying to link transcript names to gene names using the Xenopus_tropicalis gtf file (which ultimate gets put into the tx2gene file). Then I run tximport to try and come up with gene level estimates and this is where the problem starts.

At this point, my tx2gene file has two columns (GENEID) and (TXNAME) and the contents both look normal as you would expect for tropical (ENSXETG.. and ENSXETT....). At the end of the import, I get the following error

None of the transcripts in the quantification files are present
  in the first column of tx2gene. Check to see that you are using
  the same annotation for both.
------
Example IDs (file): [TRINITY_DN88461_c0_g4_i1, TRINITY_DN88461_c0_g5_i1, TRINITY_DN88461_c0_g6_i1, ...]
------
Example IDs (tx2gene): [ENSXETG00000000002, ENSXETG00000000003, ENSXETG00000000004, ...]
------

This can sometimes (not always) be fixed using 'ignoreTxVersion' or 'ignoreAfterBar' I now realize my de novo assembly is going to be in transcripts names that Trinity uses (TRINITY_DN88461...)

So my question is do I need to somehow annotate the de novo assembled transcriptome and use that instead of the Xenopus tropical gtf? If so, how to I go about obtaining a GTF file off of a de novo transcriptome?

salmon denovo tximport • 1.7k views

ADD COMMENT • link updated 3.0 years ago by ponganta ▴ 590 • written 3.0 years ago by codyas • 0

0

Entering edit mode

You do not have to run trinity if you are not interested in identifying new transcripts beyond what is available in known transcriptome. In that case using the transcriptome sequence from Ensembl for salmon analysis (with ENSXETG00000000003 names) will keep thing simple. You can then use tximport on that output.

Note: Ensembl's version of X. tropicalis transcriptome can be downloaded here.

ADD REPLY • link 3.0 years ago by GenoMax 141k

0

Entering edit mode

Ah, apologies I forgot to mention. These samples are actually a species of poison dart frog Dendrobates auratus. I only used the gtf of xenopus because it would be closer than using human. Does that change your answer at all or just use the xenopus transcriptome?

ADD REPLY • link 3.0 years ago by codyas • 0

0

Entering edit mode

If you only have the de novo transcriptome that you assembled using trinity then you could simply use it for salmon analysis with default ID's. I don't know how many transcripts you have but you may want to make them non-redundant using CD-HIT before you do salmon analysis.

You could try and do the analysis against Xenopus transcriptome (hopefully the species are evolutionarily close enough ) to get a rough idea about your data but ultimately you will need to go back repeat the analysis against your transcriptome and annotate the transcripts to find out what they are. Here is one post to get you started: Annotating sequences after de-novo Trinity assembly and RSEM analysis...there must be an easier way!

ADD REPLY • link 3.0 years ago by GenoMax 141k

score 0 · Answer 1 · 2021-04-09

Did you conduct annotation for your de novo transcriptome? If you use a third-party annotation, contig- and gene names of course will not match! The process:

Assembly
Annotation (Use the X. tropicalis proteome here)
Quantification (Using reads and assembly)
Gene-level aggregation (Counts & Annotation)