Question

Deseq Analysis With Two Samples Without Replicates, Most Padj Equal To 1 And Na

0

Entering edit mode

10.4 years ago

xiaojuhu13 ▴ 150

I only get two samples without replicates for the DEseq analysis,but the results look unnormal,most FDR equal to 1.

> counts = read.table(file="48_50_1", header=T, row.names=1)
> my.design<-data.frame(row.names=colnames(counts),condition=c("L","H"))
> conds <- factor(my.design$condition)
> cds <- newCountDataSet( counts, conds )
> cds <- estimateSizeFactors( cds )
> sizeFactors( cds )
      low      high 
0.9225312 1.0839742 
> cds<-estimateDispersions(cds, method='blind',sharingMode='fit-only')
> cds<-nbinomTest(cds,"L","H")
> head(cds)
     id baseMean baseMeanA baseMeanB foldChange log2FoldChange pval padj
1   23B        0         0         0        NaN            NaN   NA   NA
2 5HT2A        0         0         0        NaN            NaN   NA   NA
3  A1BG        0         0         0        NaN            NaN   NA   NA
4  A1CF        0         0         0        NaN            NaN   NA   NA
5   A2M        0         0         0        NaN            NaN   NA   NA
6 A2ML1        0         0         0        NaN            NaN   NA   NA

after trimming the 0 value, there are just 6 gene id padj are not equal to 1, the total nuber is 332 gene id.

deseq • 9.9k views

ADD COMMENT • link updated 19 months ago by rutuja.digraskar • 0 • written 10.4 years ago by xiaojuhu13 ▴ 150

1

Entering edit mode

As with your Edger Results Without Replicates, Fdr Looks Unnormal, why do you find this unusual. Without replicates, you have almost no power to detect anything.

ADD REPLY • link 10.4 years ago by Devon Ryan 104k

0

Entering edit mode

yeah, after trimming pval=NA, only 332 were left.The total are more than 20,000 genes.

ADD REPLY • link 10.4 years ago by xiaojuhu13 ▴ 150

0

Entering edit mode

That alone seems a bit odd, I've never had a library only cover that few genes. You might look at the alignments to see if they're wonky.

ADD REPLY • link 10.4 years ago by Devon Ryan 104k

0

Entering edit mode

The NA's you are showing you'll also see that your fold change values are NaN (Not a Number) and you're base means are 0. NaN values are when the software runs into either overflow or underflow errors because it is dealing with floating point numbers or doubles that are too large or too small for it to deal with. I forget exactly how many digits this corresponds to but it is a lot. In your case the suspicion would be severe underflow. Given the base means of zero I would assume those are all genes in which you simply have no read coverage.

I suspect something wonky is going on with your dataset as suggested. Also, of course there will be a power issue because of lack of replicates so you may not want to invest too much into the p-values, you'll just have lots of potential false positives in your dataset.

ADD REPLY • link 10.4 years ago by DG 7.3k

score 1 · Answer 1 · 2013-11-26

1

Entering edit mode

10.4 years ago

swbarnes2 14k

If you have no replicates, is it even worth using fancy software like DESeq? Wouldn't you just be looking at ratios? You can do that yourself in Excel.

ADD COMMENT • link 10.4 years ago by swbarnes2 14k

score 0 · Answer 2 · 2022-09-30

0

Entering edit mode

19 months ago

rutuja.digraskar • 0

NOIseq gave me good results with foldchange and expression difference for no replicates.. I used the following tutorial: https://jiankaiwang.gitbooks.io/bioinfo-and-combio/content/ngs/noiseq_differential_expression_in_rna-seq.html

ADD COMMENT • link 19 months ago by rutuja.digraskar • 0