Problem with gene names in tximport
0
0
Entering edit mode
3.8 years ago

I am trying to import quant.sf files output from salmon into R using tximport and a transcript to gene (tx2g) file from BUSpaRse:tr2g_ensembl(). The transcriptID's are all ensembl, and the geneID's are gene names, but the output of tximport has rownames as MGI gene names which don't seem to be present in any of the files used.

It isn't too much hassle to subsequently change the gene names, but it would be nice to understand why this is happening. I have attached images below (quant.sf, txi output, tx2g).

quant.sf filetxi outputtx2g df

tximport RNA-Seq rna-seq R • 1.8k views
ADD COMMENT
0
Entering edit mode

you mean rownames(txi$abundance) are different than tx2g$GENEID ? Are you sure?

ADD REPLY
0
Entering edit mode

I do not see anything unusual, the order of tx2g is simply different than the quant.sf You should check things systematically by comparing the lists with code, not by eye. tximport though is not inventing new names or pulling data from databases, it simply takes what you give it, you're fine.

ADD REPLY
0
Entering edit mode

Hi,

Where did you get the tx2g? When you submit a list of ensembl gene ids to ensembl, ensembl returns a list of the same gene ids with the common gene names, although the order of the genes is not the same as the order of the genes submitted.

So, that's why the gene ids are not matching between files, because they are not ordered. You might have queried genes without a common gene name, i.e., without annotation, and in that case ensembl will not return any thing. Therefore, be very careful because probably the size of your gene lists differ, i.e., the no. of ensembl genes ids > no. of common gene names retrieved.

I hope this helps.

António

ADD REPLY
0
Entering edit mode

Sorry, I think my previous comment is not related with the problem that you're facing.

As far as I know, Salmon quant.sf files quantify transcripts, and, therefore their identifier are transcript ids such as, ENSMUST00000193812.1. To provide data to tximport, you need to provide the salmon files (quant.sf) as well as tx2gene file parameter. This file, according to documentation:

a two-column data.frame linking transcript id (column 1) to gene id (column 2).

In your tx2g file (assuming that you're using this to tx2gene file parameter) you have ensembl gene ids to common gene names, and not ensembl transcript ids to gene ids. I believe that's why is not working as you expect.

I hope this helps.

António

ADD REPLY

Login before adding your answer.

Traffic: 2461 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6