Question

Number of accessible regions included by DESeq2 to generate the PCA plot with the nf-core ATACseq pipeline ?

0

Entering edit mode

7 months ago

Alexandre Eeckhoutte • 0

Hello everyone,

I launched an analysis of some ATACseq data across different conditions. I launch the cf-core ATACseq pipeline v 1.2.1:

https://nf-co.re/atacseq/2.1.2/docs/output

In the output I have a PCA generated by DESeq2 from read normalized. The exact sentence is :

"The script included in the pipeline uses DESeq2 to normalise read counts across all of the provided samples in order to create a PCA plot and a clustered heatmap showing pairwise Euclidean distances between the samples in the experiment"

My question is the following: What is the number of counts per accessible region across replicates used for this PCA ? All reads, only the 500 most variable accessible regions ? I think it is the lastest but can't find the explicit information and now I am wondering.

Thank you in advance for your response !

Alexandre

nf-core atac-seq PCA DESeq2 • 990 views

ADD COMMENT • link updated 7 months ago by Michael Love ★ 2.6k • written 7 months ago by Alexandre Eeckhoutte • 0

0

Entering edit mode

I think it uses all the reads because you can use following functions for PCA.

### Transform counts for data visualization
rld <- rlog(dds, blind=TRUE)

### Plot PCA 
plotPCA(rld, intgroup="sampletype")

ADD REPLY • link 7 months ago by bk11 ★ 2.4k

score 1 · Answer 1 · 2023-09-27

1

Entering edit mode

7 months ago

jespinosa ▴ 10

Hi, The one rendered in the multiqc report corresponds to all genes and if you go to the corresponding path in your results folder you will also find a pdf with both top 500 more accessible regions and all genes, see here

ADD COMMENT • link 7 months ago by jespinosa ▴ 10

0

Entering edit mode

Hello,

Thank you for you reply! Actually I made a mistake. I missread the version avaialable on my cluster, it is not the lastest 2.1.2 but 1.2.1.... The file I'm referencing to is the same as you indicated but I don't have the precision about all/500 more variable accessible regions : narrow_peak_concensus_peak_plot. Do you know for this version what is used ?

Best, Alexandre

ADD REPLY • link 7 months ago by Alexandre Eeckhoutte • 0

score 0 · Answer 2 · 2023-09-27

0

Entering edit mode

7 months ago

Alexandre Eeckhoutte • 0

Hello,

If you follow this logic then it would be the top 500 most variable accessible region no ? (my question is not clear maybe, but I'm talking about number of accessible regions, not reads)

From the documentation of DESeq2 (v1.12.3) :

"plotPCA"(object, intgroup = "condition", ntop = 500, returnData = FALSE)"

ntop = number of top genes to use for principal components, selected by highest row variance

Best, Alexandre

ADD COMMENT • link 7 months ago by Alexandre Eeckhoutte • 0

1

Entering edit mode

Correct, by default.

I just changed in devel branch that it will print this argument to the console when plotPCA() is called.

ADD REPLY • link 7 months ago by Michael Love ★ 2.6k