Analyzing RNA-Seq with duplicate ensembl ids using DESeq2: should tximport be used?
Entering edit mode
3.1 years ago
Ridha ▴ 130

Greetings!, Hope everyone is doing well!

I have a question regarding duplicate ensemble ids of RNA-Seq data. I am using DESeq2 to analyze raw counts from a dataset from the GEO database. I have imported the dataset using read.table and not tximport.

From my simple understanding of rna-seq workflow, to prepare the data for DESeq2(DESeqDataSetFromMatrix), the row names of count data should be the identifiers of gene/transcripts(e.g. gene name or ensemble gene id). However, when I try to make the ensemble ids as row names, like this:


I get the following error (Error in rowNamesDF<-(x, value = value) : duplicate 'row.names are not allowed). and when I check for duplicates


I get that there are 30 duplicates in my gene_id(ensemble id).

I went on and removed duplicates


But now my question is: is it correct to do what I have done? The data I am using is from the GEO database and when I go to the description of how they have prepared the raw counts, I read the following :

Raw sequencing data was demultiplexed by bcl2fastq v.2.20 Raw reads obtained from RNA-Seq were aligned to the transcriptome using STAR (version 2.5.0) (Dobin A et al., 2013) / RSEM (version 1.2.25) (Li B and Dewey CN, 2011) with default parameters, using a custom human GRCh38 transcriptome reference downloaded from, containing all protein coding and long non-coding RNA genes based on human GENCODE version 33 annotation.

So, based on this description, I understood that it's not gene counts what is provided since it was aligned to the transcriptome, but transcript counts instead. Therefore, I should use tximport to import the data. Is that correct?

Thank you very much in advance for your help! Best, Ridha

RNA-Seq alignment R • 892 views

Login before adding your answer.

Traffic: 2369 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6