Number of accessible regions included by DESeq2 to generate the PCA plot with the nf-core ATACseq pipeline ?
2
0
Entering edit mode
7 months ago

Hello everyone,

I launched an analysis of some ATACseq data across different conditions. I launch the cf-core ATACseq pipeline v 1.2.1:

https://nf-co.re/atacseq/2.1.2/docs/output

In the output I have a PCA generated by DESeq2 from read normalized. The exact sentence is :

"The script included in the pipeline uses DESeq2 to normalise read counts across all of the provided samples in order to create a PCA plot and a clustered heatmap showing pairwise Euclidean distances between the samples in the experiment"

My question is the following: What is the number of counts per accessible region across replicates used for this PCA ? All reads, only the 500 most variable accessible regions ? I think it is the lastest but can't find the explicit information and now I am wondering.

Thank you in advance for your response !

Alexandre

nf-core atac-seq PCA DESeq2 • 990 views
ADD COMMENT
0
Entering edit mode

I think it uses all the reads because you can use following functions for PCA.

### Transform counts for data visualization
rld <- rlog(dds, blind=TRUE)

### Plot PCA 
plotPCA(rld, intgroup="sampletype")
ADD REPLY
1
Entering edit mode
7 months ago
jespinosa ▴ 10

Hi, The one rendered in the multiqc report corresponds to all genes and if you go to the corresponding path in your results folder you will also find a pdf with both top 500 more accessible regions and all genes, see here

ADD COMMENT
0
Entering edit mode

Hello,

Thank you for you reply! Actually I made a mistake. I missread the version avaialable on my cluster, it is not the lastest 2.1.2 but 1.2.1.... The file I'm referencing to is the same as you indicated but I don't have the precision about all/500 more variable accessible regions : narrow_peak_concensus_peak_plot. Do you know for this version what is used ?

Best, Alexandre

ADD REPLY
0
Entering edit mode
7 months ago

Hello,

If you follow this logic then it would be the top 500 most variable accessible region no ? (my question is not clear maybe, but I'm talking about number of accessible regions, not reads)

From the documentation of DESeq2 (v1.12.3) :

"plotPCA"(object, intgroup = "condition", ntop = 500, returnData = FALSE)"

ntop = number of top genes to use for principal components, selected by highest row variance

Best, Alexandre

ADD COMMENT
1
Entering edit mode

Correct, by default.

I just changed in devel branch that it will print this argument to the console when plotPCA() is called.

ADD REPLY

Login before adding your answer.

Traffic: 1686 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6