I have recently run an ATAC-seq on frozen mouse liver tissue samples (although I do not think that can be an excuse) and the data seems really bad and does not pass the QC according to the ENCODE standard. I wonder whether these data can still be used (that is to be used in the scientific article to support conclusion/ exploratory analysis)?
Here are some of the typical qc results (I do have 4 biological replicates for each group, but the results are similar):
- 18195969 * 2 of reads after filtered mitochondrial reads and deduplication
- Fraction of reads in NFR: 0.50796
- NFR / mono-nuc reads: 1.146738 (failed in QC)
- Fraction of Reads in universal DHS regions: 0.36985
- Fraction of Reads in blacklist regions: 0.0017758
- Fraction of Reads in promoter regions: 0.02336
- Fraction of Reads in enhancer regions: 0.34436
- NRF = Distinct/Total: 0.427141
- PBC1 = OneRead/Distinct: 0.381561
- PBC2 = OneRead/TwoRead:1.430665
- Peak region size (min/ 25%/ 50%/ 75%/ max): 150/ 169/ 224/ 292/ 1777
- TSS enrichment: 3.36877
- FRiP for macs2 raw peaks: 0.1
- FRiP for overlap peaks: 0.0278
Will analysis of these data lead to faulty conclusions? Or they will only hurt the sensitivity of the assay (e.g. some of the marginally perturbed regions will be masked by noise)?
It would be nice if you may give some suggestions on troubleshooting the experiment on how to improve the quality if I can (or have to) repeat the sequencing.
Thank you!
Do you have the bioanalyzer tracks at hand? Can you show a browser track from e.g. GAPDH locus? How many peaks do you get per sample?
Not sure how ENCODE calculates this, but is this simply the reads overlapping peaks divided by total reads? If so, yeah, I've seen better FRiPs, but also worse. Might still be usable. Highly celltype-dependent as well. As you seem to have replicates you can use a replicate-aware peak caller such as Genrich to eliminate spurious calls. Plus if you want differential analysis and use a proper framework such as edgeR then the replicate information is intrinsically used. You might miss some true positives of course, especially peaks with lower counts, but I would definitely explore data first before throwing in the bin. What is the analysis goal?
Frozen specimen can absolutely be an excuse, (improperly) frozen samples can have notably compromised chromatin integrity. We do a lot of ATAC-seq in our lab, usually excellent data quality, once tried frozen leukemias from the N2 that were several years old => complete failure, and I made these samples at a time where I had already lots of experience with the assay so I am confident it was not a handling problem plus fresh samples processed at the same time were good.
I got around 150000 peaks for each sample. Sorry for the late reply because I am so new to ATAC-seq analysis, and it took me some time to figure out how to get the browser track. .
The first 4 rows should be the same group of samples while another treated group for the other 4.
I found the following definition from ENCODE for the FRiP:
All the sample have similar trace (I did it from a tape station): ![trace][2] The ladder is inserted computationally, so it is for reference only.
Indeed, I found a similar pattern in this [scientific data publication][3]: ![data from Liu et al][4]
and here is mine: ![my data][5]
But what scared me is the M-plot of the data that there are 2 lines of fragment enriched at around 150bp: ![mplot][6]
I wonder, is it because I did not do proper QC on the reads, and how to get rid of these artifacts (at least it seems to me that they are not normal).
Thanks!
Please refer to the new question.