Question: Deseq Analysis With Two Samples Without Replicates, Most Padj Equal To 1 And Na
0
gravatar for xiaojuhu13
5.7 years ago by
xiaojuhu13150
China
xiaojuhu13150 wrote:

I only get two samples without replicates for the DEseq analysis,but the results look unnormal,most FDR equal to 1.

> counts = read.table(file="48_50_1", header=T, row.names=1)
> my.design<-data.frame(row.names=colnames(counts),condition=c("L","H"))
> conds <- factor(my.design$condition)
> cds <- newCountDataSet( counts, conds )
> cds <- estimateSizeFactors( cds )
> sizeFactors( cds )
      low      high 
0.9225312 1.0839742 
> cds<-estimateDispersions(cds, method='blind',sharingMode='fit-only')
> cds<-nbinomTest(cds,"L","H")
> head(cds)
     id baseMean baseMeanA baseMeanB foldChange log2FoldChange pval padj
1   23B        0         0         0        NaN            NaN   NA   NA
2 5HT2A        0         0         0        NaN            NaN   NA   NA
3  A1BG        0         0         0        NaN            NaN   NA   NA
4  A1CF        0         0         0        NaN            NaN   NA   NA
5   A2M        0         0         0        NaN            NaN   NA   NA
6 A2ML1        0         0         0        NaN            NaN   NA   NA

after trimming the 0 value, there are just 6 gene id padj are not equal to 1, the total nuber is 332 gene id.

deseq • 5.3k views
ADD COMMENTlink modified 5.7 years ago by swbarnes26.2k • written 5.7 years ago by xiaojuhu13150
1

As with your Edger Results Without Replicates, Fdr Looks Unnormal, why do you find this unusual. Without replicates, you have almost no power to detect anything.

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by Devon Ryan91k

yeah, after trimming pval=NA, only 332 were left.The total are more than 20,000 genes.

ADD REPLYlink written 5.7 years ago by xiaojuhu13150

That alone seems a bit odd, I've never had a library only cover that few genes. You might look at the alignments to see if they're wonky.

ADD REPLYlink written 5.7 years ago by Devon Ryan91k

The NA's you are showing you'll also see that your fold change values are NaN (Not a Number) and you're base means are 0. NaN values are when the software runs into either overflow or underflow errors because it is dealing with floating point numbers or doubles that are too large or too small for it to deal with. I forget exactly how many digits this corresponds to but it is a lot. In your case the suspicion would be severe underflow. Given the base means of zero I would assume those are all genes in which you simply have no read coverage.

I suspect something wonky is going on with your dataset as suggested. Also, of course there will be a power issue because of lack of replicates so you may not want to invest too much into the p-values, you'll just have lots of potential false positives in your dataset.

ADD REPLYlink written 5.7 years ago by Dan Gaston7.1k
0
gravatar for swbarnes2
5.7 years ago by
swbarnes26.2k
United States
swbarnes26.2k wrote:

If you have no replicates, is it even worth using fancy software like DESeq? Wouldn't you just be looking at ratios? You can do that yourself in Excel.

ADD COMMENTlink written 5.7 years ago by swbarnes26.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1139 users visited in the last hour