Question

Deseq Sizefactors Help

0

Entering edit mode

12.6 years ago

Sara ▴ 130

Hello,

I try to do the diff analysis by DESeq on two samples coming from two different condition without replicate

I run the script of DESeq and when I give the library size of my sample, the size factor is 1 for both conditions

libsizes <- c( Cond1 =80040653, Cond2 =360265740)
cds=estimateSizeFactors(cds)
sizeFactors(cds)

    Cond1 Cond2 
       1    1

I don't understand this estimation?

when I don't give the libsizes, it become

 sizeFactors(cds)
     Cond1 Cond2 
1.7320508 0.5773503

Could you please explain for me this issue?

Thanks in advance Sara

deseq • 16k views

ADD COMMENT • link updated 12.6 years ago by Damian Kao 16k • written 12.6 years ago by Sara ▴ 130

0

Entering edit mode

Probably you assigned sizes first and then did the estimate, which may just scale your assigned sizes back to 1. I think the right way is to estimate the factor directly from the 'cds' object without assigning any libsizes.

ADD REPLY • link 12.6 years ago by Vitis ★ 2.5k

0

Entering edit mode

Hi Jeremy thanks for your comment could u please tell me what did this sizeFactors function? is it for normalization of data (scale normalization)?

ADD REPLY • link 12.6 years ago by Sara ▴ 130

score 11 · Answer 1 · 2011-11-15

The author of DESeq wrote a post on SeqAnswers a while back about how the size factors thing work in DESeq. I'll try to find the post, but basically it normalizes the datasets by:

-Take the geometric mean of each condition for a gene and use that as the reference expression data set.

-For each condition, get a list of quotients of each gene expression value to its reference expression.

-The median of each condition quotient list is the normalization factor for that data set.

edit*

Here is the post: http://seqanswers.com/forums/showpost.php?p=16468&postcount=13

score 0 · Answer 2 · 2011-09-20

0

Entering edit mode

12.6 years ago

Jeremy Leipzig 22k

I don't see where in the documentation it says libsizes is a magic word.

Either use the estimateSizeFactors to use your count data as the source of the estimate, or set it manually using sizeFactors.

Now, why are there so many Sara's?

alt text

ADD COMMENT • link 12.6 years ago by Jeremy Leipzig 22k

3

Entering edit mode

Because Sara is a very popular name amongst bioinformatician LOL

ADD REPLY • link 12.5 years ago by Pasta ★ 1.3k

1

Entering edit mode

could be some OpenID flaking - good catch - merged the Sara-s - There Can Be Only One

ADD REPLY • link 12.6 years ago by Istvan Albert 100k