differential expression of human sample showing 63000 genes in output
0
0
Entering edit mode
7.3 years ago
Kritika ▴ 260

Hello all Recently i got human rna seq sample for differential expression analysis I used htseq-count and cufflink for abundance estimation and Deseq and cuffdiff for differential expression analysis . In a output of both deseq and cuffdiff i am getting 63000 genes which is not possible when human have 30000 genes in their genome. I used reference of GrCh38 and gtf (GRCh38) downloaded from ensemble. I am not able to get where i am getting wrong. Please help!!!!!!!!!!!

RNA-Seq human trancriptome cuffdiff • 1.8k views
ADD COMMENT
0
Entering edit mode

63000 genes or transcripts? If you provide details on how you ran each step of your pipeline it will be easier to determine what went wrong. For example, if yoou received input files as BAM files, and they were aligned to references with a different order of chromosomes, I would not be surprised if every gene appears to be differentially expressed.

ADD REPLY
0
Entering edit mode

Hi I ran the accepted.bam files which i got from tophat i kept --b2 very sensitive. This bam file is not sorted.

ADD REPLY
0
Entering edit mode

Is your data paired end? The htseq FAQ states

For paired-end data, the alignment have to be sorted either by read name or by alignment position.

ADD REPLY
0
Entering edit mode

yes its pairend data

ADD REPLY
0
Entering edit mode

but then after running on sorted bam file on cufflink i am getting same result

ADD REPLY
0
Entering edit mode

Can you post commands you used for htseq-count and DESeq2?

ADD REPLY
0
Entering edit mode
htseq-count -f bam -s no -t exon -i gene_id -o htseq_sample1_samout accepted_hits.bam hg19.gtf > count_list1
htseq-count  -f bam -s no -t exon -i gene_id -o htseq_sample2_samout accepted_hits.bam hg19.gtf

deseqcommand

count_1_2 <- subset(count_list,select=c("Control","Test"))
conditions <- factor(c("Control","Test"))
sample1_2 <- newCountDataSet(count_1_2 ,conditions)
sample1_2 <- estimateSizeFactors(sample1_2)
sample1_2 <- estimateDispersions(sample1_2 , method="blind",sharingMode = "fit-only" , fitType= "local") 
result_1_2  <- nbinomTest(sample1_2,"Control","Test")
head(result_1_2 )
write.csv(result_1_2 ,"~/Desktop/RNASeq/deseq_results/sample1_2.csv")
ADD REPLY
0
Entering edit mode

But this time i used ref and gtf of hg19 For both the version initially i was getting 63000 genes. This time i only filtered protien coding genes in hg19.gtf and used that i got 27000 genes for all samples. Not sure whether i am correct or not

ADD REPLY
0
Entering edit mode

Your R code looks incomplete, did you use the DESeqDataSetFromHTSeq function?

ADD REPLY

Login before adding your answer.

Traffic: 2398 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6