Question

DEG but counts in only one sample!

0

Entering edit mode

3.1 years ago

andreiareis1987 ▴ 40

Hi there!

I am doing Rnaseq data analysis with yeast samples. I have the following conditions:A_1, A_200, B_1 and B_200 total of 18 samples and after the filtering the null and getting out the genes that has counts in only one sample, I end up with 6698 genes. I did a default normalization:

dds<-DESeqDataSetFromMatrix(countData = table, colData = data, design= ~ Cell_Gen)

dds<-DESeq(dds)

And i performed the statistical test to found the degs:

results(dds, name=c("Cell_Gen_A_200_vs_A_1"), pAdjustMethod="fdr", lfcThreshold = 1, alpha=0.05)

In this particularly comparison i got several degs but one has counts in only one sample (for the total samples in the specific conditions) :

       28A 28C 29A 29C 30A 31A 32A  33A 34A  34C 37A 37C  38C 41A 41C 45C 68C
GeneX   0   0   0   0   0   0   0 3680   0 3203   0   0 3710   0   0   0   0

In this case only de sample 33A others samples that end with A dont have counts, so i wonder why this is a deg.

                log2FC   pvalue    padj  
   GeneX      24.445038 7.98e-08 7.61e-05

For others comparisons i got others genes but with same situation... with only counts in one sample.

I will be very much appreciated if anyone clarify me?

All the best, Andreia

deseq2 rnaseq degs yeast • 1.9k views

ADD COMMENT • link updated 3.1 years ago by Istvan Albert 100k • written 3.1 years ago by andreiareis1987 ▴ 40

0

Entering edit mode

Since we can't see the sample data sheet we don't know which category those samples are, but it's perfectly fine to have a DEG with no expression in one condition, but tons of expression in another condition.

ADD REPLY • link 3.1 years ago by rpolicastro 13k

0

Entering edit mode

Ok, so the samples are the 28A, 29A, 30A (condition A) versus 31A, 32A, 33A, 34A, 37A, 41A (condition B). My question is that I have expression in only one sample, in this case is the 33A. I thought the filtering step from deseq2 removed this "outliers"... or its not a outlier? I am confused :/

ADD REPLY • link 3.1 years ago by andreiareis1987 ▴ 40

1

Entering edit mode

So, I figured out why its not applying the filter ... its because the threshold of cook outliers are NA! Like i read in another post here in this forum, my phenotype variable has 4 groups, but one of them only have 2 samples (i had a sample outlier...) so, the cook filter only applied with at least 3 samples. Given this issue i am trying to understand what is the best approach to check and remove DEGs that are not really DEGs. Can anyone help me? Thanks

ADD REPLY • link 3.1 years ago by andreiareis1987 ▴ 40

score 1 · Answer 1 · 2021-03-24

1

Entering edit mode

3.1 years ago

Istvan Albert 100k

It is not clear from the above what your design is, but the values look odd. You have a bunch of 0 and three values around 3000

       28A 28C 29A 29C 30A 31A 32A  33A 34A  34C 37A 37C  38C 41A 41C 45C 68C
GeneX   0   0   0   0   0   0   0 3680   0 3203   0   0 3710   0   0   0   0

GeneX should not come up as differentially expressed unless your condition selects only the columns with the 3K values. Otherwise, the spread ought to be so large as to make the p-values non-significant.

You may not be applying the methods correctly.

ADD COMMENT • link 3.1 years ago by Istvan Albert 100k

0

Entering edit mode

Hi thanks for your reply. This is raw counts ... not normalized.

ADD REPLY • link 3.1 years ago by andreiareis1987 ▴ 40

1

Entering edit mode

normalization won't change the 0 into anything else than zero, and 3000 won't become comparable small to zero either,

the main point stands, the data is unnatural and it is hard to see how this row would come up as differentially expressed

ADD REPLY • link 3.1 years ago by Istvan Albert 100k