ATAC-seq FastQC showing 5' sequence content bias
4 months ago
Papyrus ★ 1.2k

Hi friends,

I'm handling some ATAC-seq data. I used FastQC to get a look at the FASTQs. The data (paired-end) are high quality and have no other relevant "issues", but in the "Per base sequence content" plot I see noticeable 5' base composition bias in both R1 and R2:

I detected Nextera adapter contamination and looked around to find that apparently the Nextera kit introduces some bias. And my plot does look similar to the one described here.

I'm guessing that this is an enrichment bias (similar to RNA-seq random hexamers) so nothing much can be done about it. But as I found little information I was hoping someone with more ATAC-seq (or Nextera) experience could clarify whether this is a normal issue and needs no preprocessing, or redirect me to more resources.

Thanks!

4 months ago
ATpoint 52k

My best guess is that this is the transposase-5 (Tn5) sequence bias as Nextera (and by this ATAC-seq) uses a modified Tn5 to cleave chromatin and insert sequencing adapters. I always see this pattern in my data as well. I think this is relatively well described if you google for Tn5 bias, and there are even attempts to correct for this, see for example the chromVAR (a Bioc package) paper.

OK! thanks for the advice and I will take a look at the package!