Question: Problem with gene names in tximport
0
gravatar for dioscorea.bulbifera
9 days ago by
dioscorea.bulbifera0 wrote:

I am trying to import quant.sf files output from salmon into R using tximport and a transcript to gene (tx2g) file from BUSpaRse:tr2g_ensembl(). The transcriptID's are all ensembl, and the geneID's are gene names, but the output of tximport has rownames as MGI gene names which don't seem to be present in any of the files used.

It isn't too much hassle to subsequently change the gene names, but it would be nice to understand why this is happening. I have attached images below (quant.sf, txi output, tx2g).

quant.sf filetxi outputtx2g df

rna-seq R tximport • 68 views
ADD COMMENTlink written 9 days ago by dioscorea.bulbifera0

you mean rownames(txi$abundance) are different than tx2g$GENEID ? Are you sure?

ADD REPLYlink written 9 days ago by Asaf8.1k

I do not see anything unusual, the order of tx2g is simply different than the quant.sf You should check things systematically by comparing the lists with code, not by eye. tximport though is not inventing new names or pulling data from databases, it simply takes what you give it, you're fine.

ADD REPLYlink written 9 days ago by ATpoint36k

Hi,

Where did you get the tx2g? When you submit a list of ensembl gene ids to ensembl, ensembl returns a list of the same gene ids with the common gene names, although the order of the genes is not the same as the order of the genes submitted.

So, that's why the gene ids are not matching between files, because they are not ordered. You might have queried genes without a common gene name, i.e., without annotation, and in that case ensembl will not return any thing. Therefore, be very careful because probably the size of your gene lists differ, i.e., the no. of ensembl genes ids > no. of common gene names retrieved.

I hope this helps.

António

ADD REPLYlink modified 9 days ago • written 9 days ago by antonioggsousa340

Sorry, I think my previous comment is not related with the problem that you're facing.

As far as I know, Salmon quant.sf files quantify transcripts, and, therefore their identifier are transcript ids such as, ENSMUST00000193812.1. To provide data to tximport, you need to provide the salmon files (quant.sf) as well as tx2gene file parameter. This file, according to documentation:

a two-column data.frame linking transcript id (column 1) to gene id (column 2).

In your tx2g file (assuming that you're using this to tx2gene file parameter) you have ensembl gene ids to common gene names, and not ensembl transcript ids to gene ids. I believe that's why is not working as you expect.

I hope this helps.

António

ADD REPLYlink modified 9 days ago • written 9 days ago by antonioggsousa340
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1683 users visited in the last hour