Analyzing RNA-Seq with duplicate ensembl ids using DESeq2: should tximport be used?
0
0
Entering edit mode
3.1 years ago
Ridha ▴ 130

Greetings!, Hope everyone is doing well!

I have a question regarding duplicate ensemble ids of RNA-Seq data. I am using DESeq2 to analyze raw counts from a dataset from the GEO database. I have imported the dataset using read.table and not tximport.

From my simple understanding of rna-seq workflow, to prepare the data for DESeq2(DESeqDataSetFromMatrix), the row names of count data should be the identifiers of gene/transcripts(e.g. gene name or ensemble gene id). However, when I try to make the ensemble ids as row names, like this:

rownames(data_sharna)<-data_sharna$gene_id

I get the following error (Error in rowNamesDF<-(x, value = value) : duplicate 'row.names are not allowed). and when I check for duplicates

sum(duplicated(data_sharna$gene_id))

I get that there are 30 duplicates in my gene_id(ensemble id).

I went on and removed duplicates

data_sharna<-data_sharna[!duplicated(data_sharna$gene_id),]

But now my question is: is it correct to do what I have done? The data I am using is from the GEO database and when I go to the description of how they have prepared the raw counts, I read the following :

Raw sequencing data was demultiplexed by bcl2fastq v.2.20 Raw reads obtained from RNA-Seq were aligned to the transcriptome using STAR (version 2.5.0) (Dobin A et al., 2013) / RSEM (version 1.2.25) (Li B and Dewey CN, 2011) with default parameters, using a custom human GRCh38 transcriptome reference downloaded from http://www.gencodegenes.org, containing all protein coding and long non-coding RNA genes based on human GENCODE version 33 annotation.

So, based on this description, I understood that it's not gene counts what is provided since it was aligned to the transcriptome, but transcript counts instead. Therefore, I should use tximport to import the data. Is that correct?

Thank you very much in advance for your help! Best, Ridha

RNA-Seq alignment R • 903 views
ADD COMMENT

Login before adding your answer.

Traffic: 3093 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6