I am trying to do get my data into a workable data frame so that I can ultimately run DESeq2 at the gene level. I have started out with my sequencing samples and ran trinity for denovo assembly, which gives a fasta file. I then run salmon to get transcript level abundances against that fasta file and this outputs a quant.sf file per sample. Next I am trying to link transcript names to gene names using the Xenopus_tropicalis gtf file (which ultimate gets put into the tx2gene file). Then I run tximport to try and come up with gene level estimates and this is where the problem starts.
At this point, my tx2gene file has two columns (GENEID) and (TXNAME) and the contents both look normal as you would expect for tropical (ENSXETG.. and ENSXETT....). At the end of the import, I get the following error
None of the transcripts in the quantification files are present in the first column of tx2gene. Check to see that you are using the same annotation for both. ------ Example IDs (file): [TRINITY_DN88461_c0_g4_i1, TRINITY_DN88461_c0_g5_i1, TRINITY_DN88461_c0_g6_i1, ...] ------ Example IDs (tx2gene): [ENSXETG00000000002, ENSXETG00000000003, ENSXETG00000000004, ...] ------
This can sometimes (not always) be fixed using 'ignoreTxVersion' or 'ignoreAfterBar' I now realize my de novo assembly is going to be in transcripts names that Trinity uses (TRINITY_DN88461...)
So my question is do I need to somehow annotate the de novo assembled transcriptome and use that instead of the Xenopus tropical gtf? If so, how to I go about obtaining a GTF file off of a de novo transcriptome?