Question: Normalization and differential analysis in ATAC-seq data
1
gravatar for anais1396
2.8 years ago by
anais139630
Brussels
anais139630 wrote:

Hello everyone!

I would like to know if someone had experiences with normalization and differential expression on ATAC-seq data. After using MACS2 for the peak calling, how can we use Dseq2 or EdgeR on these datas? Someone try this? What is the protocol you use to do that?

Thank you in advance!

Anaïs

ADD COMMENTlink modified 10 months ago by ATpoint44k • written 2.8 years ago by anais139630
1

I believe you could use diffBind strategy: merge the peaks of different samples, count the number of reads in it and perform a differential analysis on those counts. I don't however have experience on ATAC-seq myself, so I cannot guarantee that's the best approach.

ADD REPLYlink written 2.8 years ago by Martombo2.7k
3
gravatar for i.sudbery
2.8 years ago by
i.sudbery10k
Sheffield, UK
i.sudbery10k wrote:

When we do this we first we filter our samples so that we are only using samples with at least 35M read ends and at least 10% FRiP (fraction reads in peaks). We then downsample each sample so that all samples have the same number of reads, before merging all samples and calling peaks. This becomes our feature set. We then count the number of read ends in each feature for our non-down sampled samples and use DESeq2 standard normalisation and differential calling to find differential peaks. We use a more stringent differential expression FDR value than you would nrormally use. I think this is conceptually similar to the diffBind approach. Eyeballing the results, it seems to work pretty well, but we don't yet have any molecular validation of the results.

ADD COMMENTlink written 2.8 years ago by i.sudbery10k
0
gravatar for ATpoint
10 months ago by
ATpoint44k
ATpoint44k wrote:

When I have several experimental groups I prefer to first call peaks with Genrich for every group, then merge the peaks with bedtools merge to get a consensus list from which we create a count matrix with all samples. I like featureCounts for this. The author of the csaw package Aarol Lun who suggests to us small bins for differential analysis would probably argue that this is not the best approach (see csaw vignette and paper for details why), but for me from a user perspective it is important to have actual peak lists for each group that are constant during a project, so that is why I use this approach. From there on I prefer the standard edgeR workflow, so normalization factor calculation, filterByExpr and then model fitting and testing. See the vignettes of edgeR and especially csaw for code examples. csaw also contains some helpful details on certain parameters one should use for ATAC/ChIP-seq compared to RNA-seq for which the tool has originally been developed. Finally, correct for multiple testing and set a FDR cutoff, e.g. 0.05 or 0.01. Again, see the manuals for details.

I agree with Ian Sudbery's comment on filtering low-quality samples. Be sure to always inspect samples on a genome browser. ATAC-seq should produce clear peaks with little noise. Peak numbers should be in the ten-thousands range and FRiPs, depending on how you calculate it should be > 10%. In some cases lower FRiPs might still be acceptable. Also be sure to make PCA plots to check for outlier samples and check normalization efficiency with MA-plots, again csaw manual has details and code for that. I would not say that 35M reads is a must, that is already fairly deeply-sequenced. I actually aim for 25M per sample but have done diff. analysis with samples that had far fewer reads. Higher counts are statistically more powerful, still replication is more important than depth.

ADD COMMENTlink modified 10 months ago • written 10 months ago by ATpoint44k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1497 users visited in the last hour
_