Is it worth to analyzing low quality ATAC-seq data
0
0
Entering edit mode
3.1 years ago
cwwong13 ▴ 40

I have recently run an ATAC-seq on frozen mouse liver tissue samples (although I do not think that can be an excuse) and the data seems really bad and does not pass the QC according to the ENCODE standard. I wonder whether these data can still be used (that is to be used in the scientific article to support conclusion/ exploratory analysis)?

Here are some of the typical qc results (I do have 4 biological replicates for each group, but the results are similar):

  1. 18195969 * 2 of reads after filtered mitochondrial reads and deduplication
  2. Fraction of reads in NFR: 0.50796
  3. NFR / mono-nuc reads: 1.146738 (failed in QC)
  4. Fraction of Reads in universal DHS regions: 0.36985
  5. Fraction of Reads in blacklist regions: 0.0017758
  6. Fraction of Reads in promoter regions: 0.02336
  7. Fraction of Reads in enhancer regions: 0.34436
  8. NRF = Distinct/Total: 0.427141
  9. PBC1 = OneRead/Distinct: 0.381561
  10. PBC2 = OneRead/TwoRead:1.430665
  11. Peak region size (min/ 25%/ 50%/ 75%/ max): 150/ 169/ 224/ 292/ 1777
  12. TSS enrichment: 3.36877
  13. FRiP for macs2 raw peaks: 0.1
  14. FRiP for overlap peaks: 0.0278

Will analysis of these data lead to faulty conclusions? Or they will only hurt the sensitivity of the assay (e.g. some of the marginally perturbed regions will be masked by noise)?

It would be nice if you may give some suggestions on troubleshooting the experiment on how to improve the quality if I can (or have to) repeat the sequencing.

Thank you!

ATAC-seq QC TSS • 2.0k views
ADD COMMENT
0
Entering edit mode

Do you have the bioanalyzer tracks at hand? Can you show a browser track from e.g. GAPDH locus? How many peaks do you get per sample?

FRiP for macs2 raw peaks: 0.1

Not sure how ENCODE calculates this, but is this simply the reads overlapping peaks divided by total reads? If so, yeah, I've seen better FRiPs, but also worse. Might still be usable. Highly celltype-dependent as well. As you seem to have replicates you can use a replicate-aware peak caller such as Genrich to eliminate spurious calls. Plus if you want differential analysis and use a proper framework such as edgeR then the replicate information is intrinsically used. You might miss some true positives of course, especially peaks with lower counts, but I would definitely explore data first before throwing in the bin. What is the analysis goal?

Frozen specimen can absolutely be an excuse, (improperly) frozen samples can have notably compromised chromatin integrity. We do a lot of ATAC-seq in our lab, usually excellent data quality, once tried frozen leukemias from the N2 that were several years old => complete failure, and I made these samples at a time where I had already lots of experience with the assay so I am confident it was not a handling problem plus fresh samples processed at the same time were good.

ADD REPLY
0
Entering edit mode

I got around 150000 peaks for each sample. Sorry for the late reply because I am so new to ATAC-seq analysis, and it took me some time to figure out how to get the browser track. Gapdh locus.

The first 4 rows should be the same group of samples while another treated group for the other 4.

I found the following definition from ENCODE for the FRiP:

Fraction of reads in peaks (FRiP) – Fraction of all mapped reads that fall into the called peak regions, i.e. usable reads in significantly enriched peaks divided by all usable reads.

All the sample have similar trace (I did it from a tape station): ![trace][2] The ladder is inserted computationally, so it is for reference only.

Indeed, I found a similar pattern in this [scientific data publication][3]: ![data from Liu et al][4]

and here is mine: ![my data][5]

But what scared me is the M-plot of the data that there are 2 lines of fragment enriched at around 150bp: ![mplot][6]

I wonder, is it because I did not do proper QC on the reads, and how to get rid of these artifacts (at least it seems to me that they are not normal).

Thanks!

Please refer to the new question.

ADD REPLY

Login before adding your answer.

Traffic: 2797 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6