**3.2k**wrote:

Hello,

I have a data in the form of a dataframe that I downloaded from GTEx Portal. It contains RNASeq gene read counts used in their study.

`> dim(expr.df)`

[1] 55993 2921

`> expr.df[1:10,1:2]`

GTEX-N7MS-0007-SM-2D7W1 GTEX-N7MS-0008-SM-4E3JI

ENSG00000223972 0 0

ENSG00000227232 158 166

ENSG00000243485 0 0

ENSG00000237613 0 0

ENSG00000268020 0 0

ENSG00000240361 0 0

ENSG00000186092 0 0

ENSG00000238009 17 2

ENSG00000233750 35 0

ENSG00000237683 8489 34

I checked the sample information file and there is no information about the conditions. I want to normalize the raw counts. For that, I want to use DESeq's getVarianceStabilizedData() function. However this function takes as an input a CountDataSet object. So when I try to make a CountDataSet object using this:

`> cds <- newCountDataSet(countData = as.matrix(expr.df))`

Error in is(conditions, "matrix") :

argument "conditions" is missing, with no default

It spits out an error asking me to specify the conditions. However, there are no conditions in this dataset. How can I normalize these values?

I think you're getting into variance there. You just want to normalize for the number of reads sequenced, right?

I believe DESeq still uses median normalization.

I don't know the commands in DESeq but if you want to do it by hand here is the basic process:

Do a scatter plot of your two condition, i.e. GTEX-N7MS-0007-SM-2D7W1 on the X axis and GTEX-N7MS-0008-SM-4E3JI on the Y axis.

a=get the median count value in GTEX-N7MS-0007-SM-2D7W1

b=get the median count value in GTEX-N7MS-0008-SM-4E3JI

your slope, and median normalization factor is b/a

plot the line through your data (y intercept =0) and see if it fits.

Sometimes it works, sometimes you need to use something different.

1.8k