Question: How tximport work with gencode transcripts?
gravatar for Sharon
19 months ago by
Sharon460 wrote:

Hi All

How to use tximport with gencode?

I used Gencode transcripts for mouse looks like this:


Salmon quantification is like this;

Name    Length  EffectiveLength TPM     NumReads
ENSMUST00000193812.1|ENSMUSG00000102693.1|OTTMUSG00000049935.1|OTTMUST00000127109.1|RP23-271O17.1-001|RP23-271O17.1|1070|TEC|   1070    489.786 0.536463        7.4771
ENSMUST00000082908.1|ENSMUSG00000064842.1|-|-|Gm26206-201|Gm26206|110|snRNA|    110     28      0       0

I use this code to generate tx2gene:

txdb <- TxDb.Mmusculus.UCSC.mm10.ensGene)
k <- keys(txdb, keytype = "GENEID")
df <- select(txdb, keys = k,  columns = "TXNAME", keytype = "GENEID") 
tx2gene <- df[, 2:1]
write.csv(tx2gene, 'MouseEnstx2gene.csv')

The MouseEnstx2gene.csv looks like this and this causes problem in further analysis:


I feel there is a tiny thing I am missing here that could fix this as it is not the same format as Salmon quantification ! Any hint? I just tried ignoreAfterBar from tximport and it doesn't work !

Error in tximport(files, type = "salmon", tx2gene = tx2gene, ignoreAfterBar = TRUE) : 
  unused argument (ignoreAfterBar = TRUE)


rna-seq R • 1.4k views
ADD COMMENTlink modified 19 months ago by Kevin Blighe53k • written 19 months ago by Sharon460
gravatar for Kevin Blighe
19 months ago by
Kevin Blighe53k
Kevin Blighe53k wrote:

Hi Sharon,

I'm not sure that you even need the ignoreAfterBar parameter. If using the tx2gene parameter, though, then the first column of your tx2gene object has to be the exact transcript name as in the Salmon files. The second column, then, is what you wan to convert these to.

For example, if you save the Salmon transcripts to a character vector (here, salmonCounts is just any file output by Salmon):

transcriptsSalmon <- as.character(salmonCounts$Name)

head(transcriptsSalmon, 10)                                                                                                                                          

We can then parse this and extract whatever information we want, information which will eventually be used as the rownames of our future txi object created by tximport via tx2gene:

tx2gene <- data.frame(transcriptsSalmon,, strsplit(transcriptsSalmon, "\\|"))[,6])
colnames(tx2gene) <- c("transcript_id", "gene_id")
head(tx2gene, 10)

transcript_id                                                                          gene_id
ENST00000456328.2|...|RP11-34P13.1-002|DDX11L1|1657|processed_transcript|              DDX11L1
ENST00000450305.2|...|RP11-34P13.1-001|DDX11L1|632|transcribed_unprocessed_pseudogene| DDX11L1
ENST00000488147.1|...|RP11-34P13.2-001|WASH7P|1351|unprocessed_pseudogene|             WASH7P
ENST00000619216.1|...|MIR6859-1-201|MIR6859-1|68|miRNA|                                MIR6859-1
ENST00000473358.1|...|RP11-34P13.3-001|RP11-34P13.3|712|lincRNA|                       RP11-34P13.3
ENST00000469289.1|..|RP11-34P13.3-002|RP11-34P13.3|535|lincRNA|                        RP11-34P13.3
ENST00000607096.1|...|MIR1302-2-201|MIR1302-2|138|miRNA|                               MIR1302-2
ENST00000417324.1|...|RP11-34P13.4-001|FAM138A|1187|lincRNA|                           FAM138A
ENST00000461467.1|...|RP11-34P13.4-002|FAM138A|590|lincRNA|                            FAM138A
ENST00000606857.1|...|RP11-34P13.17-001|OR4G4P|840|unprocessed_pseudogene|             OR4G4P

The pitfall of using just the gene name is that it is not unique, so, will result in issues. You could also merge fields:

tx2gene <- data.frame(transcriptsSalmon,
  paste(, strsplit(transcriptsSalmon, "\\|"))[,1],, strsplit(transcriptsSalmon, "\\|"))[,6], sep="_"))
colnames(tx2gene) <- c("transcript_id", "gene_id")

 [1] ENST00000456328.2_DDX11L1      ENST00000450305.2_DDX11L1     
 [3] ENST00000488147.1_WASH7P       ENST00000619216.1_MIR6859-1   
 [5] ENST00000473358.1_RP11-34P13.3 ENST00000469289.1_RP11-34P13.3
 [7] ENST00000607096.1_MIR1302-2    ENST00000417324.1_FAM138A     
 [9] ENST00000461467.1_FAM138A      ENST00000606857.1_OR4G4P      
[11] ENST00000642116.1_OR4G11P      ENST00000492842.2_OR4G11P     
[13] ENST00000641515.2_OR4F5        ENST00000335137.4_OR4F5       
[15] ENST00000466430.5_RP11-34P13.7 ENST00000477740.5_RP11-34P13.7
[17] ENST00000471248.1_RP11-34P13.7 ENST00000610542.1_RP11-34P13.7
[19] ENST00000453576.2_RP11-34P13.7 ENST00000495576.1_RP11-34P13.8

Does that make sense?


ADD COMMENTlink modified 19 months ago • written 19 months ago by Kevin Blighe53k

A comprehensive answer from you, as usual. Nice to have you around, Sir!

ADD REPLYlink written 19 months ago by ATpoint28k

Thanks bro. Have to empty the contents of my head before I'm too old to remember anything!

ADD REPLYlink written 19 months ago by Kevin Blighe53k

Hi Kevin Thanks so much and sorry for the late reply, was away. Yes, this makes sense. I understand where the problem came from but did not how to solve, am still learning R. I will try that. Thanks so much :)

ADD REPLYlink written 19 months ago by Sharon460

No problem. Trust that you are not working too much this weekend.

ADD REPLYlink written 19 months ago by Kevin Blighe53k

work doesn't leave me =D I just switched to something else :) :)

ADD REPLYlink written 19 months ago by Sharon460
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1090 users visited in the last hour