Understanding the output of Negative Binomial in DESeq2
2
0
Entering edit mode
3.1 years ago
synat.keam ▴ 100

enter image description hereDear Seniors,

Hope you all are doing great. I am very new to RNASeq and DESeq2. I know that the negative Binomial (Gamma Poisson) is used to fit this RNAseq count data. All genes are assessed/fitted between two conditions and we get the basemean and log2fold change and then wald test is used to examine whether the coefficient is equal to zero if I am not wrong to determine whether the log2fold change is significant. I am so far familiar with linear rather than generalized linear model. I did a bit of poisson regression before.

I have the attached the output of the model after fitting through DESeq2. In my output, there were five genes are upregulated and 2 genes are downregulated. Are zero is a default option of log2fold change to be considered as up and down?

Does this mean that there are seven genes in total that will have significant adjusted p-value? i recall when multiple linear regression is fitted, we get an overall p-value so we can quickly know whether at least one coefficient is not equal to zero if p-value <0.05 or vice versa or I can just skim through the output to examine how many coefficients are significant. However, with the negative 2 binomial in DESeq2, I could not find the overall p-value at all

Also, the output could not list all the genes and adjusted p-value there because there are many of them have been fitted. Therefore, I am wondering how could I know which genes have significant log2fold change by looking through the output? Hope you do not mind me with my question as I am very new to RNAseq experiment and the analysis.

Additonally, I understand how to interpret the volcano plot. However, I am wondering whether all genes used for visialization in volcano plot? I have attached the plot, it seems not many dot points there so I am assuming only some genes are used to constructed volocano plot. Am I right? Do you think the plot looks alright. Sorry for asking and looking forward to hearing from author and seniors at your earliest convenience.

Kind Regards,

Synatenter image description here

DESe2 RNASeq • 2.1k views
ADD COMMENT
0
Entering edit mode
3.1 years ago
ATpoint 85k

I have the attached the output of the model after fitting through DESeq2. In my output, there were five genes are upregulated and 2 genes are downregulated. Are zero is a default option of log2fold change to be considered as up and down?

Yes, the default Null hypothesis to test against is zero. If genes are below the FDR threshold (0.1 by default) and log2FoldChange > 0 then we call this "upregulated", and "downregulated" if < 0. You have 5 up- and 2 downregulated genes at the default FDR cutoff of 0.1. You are free to set this to 0.05 or any value you feel good with.

Does this mean that there are seven genes in total that will have significant adjusted p-value? i recall when multiple linear regression is fitted, we get an overall p-value so we can quickly know whether at least one coefficient is not equal to zero if p-value <0.05 or vice versa or I can just skim through the output to examine how many coefficients are significant. However, with the negative 2 binomial in DESeq2, I could not find the overall p-value at all

The results table contains the coef/name/contrast you specified. If none was specified then afaik the last element of resultsNames(dds) is used. The manual extensively describes how to set coefs and contrasts, please read it.

Additonally, I understand how to interpret the volcano plot. However, I am wondering whether all genes used for visialization in volcano plot? I have attached the plot, it seems not many dot points there so I am assuming only some genes are used to constructed volocano plot. Am I right? Do you think the plot looks alright.

All the plot shows is that you barely have significant genes. You should adjust the x-axis as it is overly wide.

ADD COMMENT
0
Entering edit mode

You should look carefully at the values for those genes with extremely high log fold changes; if it's caused by most samples having zero expression, and a few samples having a little expression, that might be an artifact. And if that leaves you with almost no changed genes, well, sometimes that is the ground truth of your experiment.

ADD REPLY
0
Entering edit mode
3.1 years ago

Are zero is a default option of log2fold change to be considered as up and down?

Yes, by default, the null hypothesis is that the log2FoldChange is zero.

Does this mean that there are seven genes in total that will have significant adjusted p-value?

Yes, you are interpreting your results correctly that you have 7 genes in total where the log2FoldChange is significantly different from 0.

i recall when multiple linear regression is fitted, we get an overall p-value so we can quickly know whether at least one coefficient is not equal to zero if p-value <0.05 or vice versa or I can just skim through the output to examine how many coefficients are significant.

This sort of depends on how you have set the design in DESeq2. In a basic differential expression analysis, where you just have 2 groups that you wish to compare, DESeq2 will fit 36,955 independent negative binomial models each with only a single coefficient (actaully, two coefficients, including the interscept). The p-value that is reported for each gene is the p-value for the null hypothesis that the value of that single coefficient for that gene.

If your design includes more than one experimental factor, or you have an experimental factor that has more than two levels, you will end up with 36,955 negative binomial models with more than 1 coeffficient. If you are using the wald test, then there is no overall p-value. Instead, the p-value that is reported is determined by the value you give either for coef, contrast or name in your call to results. You can find valid values for the name parameter by running resultNames on your DESeq object.

If you specifically want an overall p-value (say you have three different treatments, and you want to know if treatments, overall, have any effect), then you need to use the Liklihood Ratio Test rather than the wald test. See http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#likelihood-ratio-test

However, I am wondering whether all genes used for visialization in volcano plot?

I don't know EnhancedVolcano inside out, but I believe it does plot all genes. It looks like it has two few points because so many points are in the same place.

ADD COMMENT
0
Entering edit mode

Dear Both Seniors,

The explanation from you both are quite clear for me and thank for your help. I really appreciated for that. I will ask in case I need further as I progressed through my analysis. have a good day!

Regards, synat

ADD REPLY

Login before adding your answer.

Traffic: 1148 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6