Question

Analyzing RNA-Seq with duplicate ensembl ids using DESeq2: should tximport be used?

0

Entering edit mode

3.1 years ago

Ridha ▴ 130

Greetings!, Hope everyone is doing well!

I have a question regarding duplicate ensemble ids of RNA-Seq data. I am using DESeq2 to analyze raw counts from a dataset from the GEO database. I have imported the dataset using read.table and not tximport.

From my simple understanding of rna-seq workflow, to prepare the data for DESeq2(DESeqDataSetFromMatrix), the row names of count data should be the identifiers of gene/transcripts(e.g. gene name or ensemble gene id). However, when I try to make the ensemble ids as row names, like this:

rownames(data_sharna)<-data_sharna$gene_id

I get the following error (Error in rowNamesDF<-(x, value = value) : duplicate 'row.names are not allowed). and when I check for duplicates

sum(duplicated(data_sharna$gene_id))

I get that there are 30 duplicates in my gene_id(ensemble id).

I went on and removed duplicates

data_sharna<-data_sharna[!duplicated(data_sharna$gene_id),]

But now my question is: is it correct to do what I have done? The data I am using is from the GEO database and when I go to the description of how they have prepared the raw counts, I read the following :

Raw sequencing data was demultiplexed by bcl2fastq v.2.20 Raw reads obtained from RNA-Seq were aligned to the transcriptome using STAR (version 2.5.0) (Dobin A et al., 2013) / RSEM (version 1.2.25) (Li B and Dewey CN, 2011) with default parameters, using a custom human GRCh38 transcriptome reference downloaded from http://www.gencodegenes.org, containing all protein coding and long non-coding RNA genes based on human GENCODE version 33 annotation.

So, based on this description, I understood that it's not gene counts what is provided since it was aligned to the transcriptome, but transcript counts instead. Therefore, I should use tximport to import the data. Is that correct?

Thank you very much in advance for your help! Best, Ridha

RNA-Seq alignment R • 903 views

ADD COMMENT • link 3.1 years ago by Ridha ▴ 130