Question: Insert size of ~20bp with ATAC-seq?
0
gravatar for a.rex
9 months ago by
a.rex190
a.rex190 wrote:

I have recently downloaded some publicly available ATAC-seq data. I aligned with BWA to reference genome, removed duplicate (in this instance 70% of library is duplicates), and then used picardtools to generate a fragment size distribution. However, I see a large peak at around 20bp? The library was sequenced with 75bp forward and 75bp reverse PEs. Does a 20bp insert length mean that the insert is just short? How can I check this? Presumably the reads have a lot of adapter sequence?

enter image description here

sequencing atac-seq alignment • 795 views
ADD COMMENTlink modified 9 months ago by ATpoint26k • written 9 months ago by a.rex190
1

can you post the plot of "distribution of insert size" ? Its common to observe a sharp peak less than 100bp but you should also see a peak of 150-200bp and then around 300bp.

ADD REPLYlink written 9 months ago by geek_y10k

I have uploaded said image now

ADD REPLYlink written 9 months ago by a.rex190
1

Odd plot, never seen anything like that in ATAC-seq data, and I think I've seen quite many of them. Which dataset is that, then I quickly run it through my pipeline to see if it is indeed an odd library or a technical thing to debug. Did you filter chrM before collecting insert sizes?

ADD REPLYlink modified 7 months ago • written 9 months ago by ATpoint26k

It is very odd - it is for a obscure species and published a few days ago.

ADD REPLYlink modified 9 months ago • written 9 months ago by a.rex190

I did not filter chrM as we do not have this information.

ADD REPLYlink written 9 months ago by a.rex190
1

Ok I see. It could be that the sharp peak is some heavily-digested non-nuclear DNA like chrM (or any other organelle DNA or parasite DNA that might be in the worm. Here is how the insert sizes look for only chrM in mouse:

enter image description here

You also see that it accumulates at short fragment sizes as this nucleosome-free is an attractive target of the transposome. Maybe you can make a kind of pseudo-chrM by taking the mitochondrial genome of a closely related well-annotated species and include it into the reference to get rid of some of these contaminations. Or maybe take all the reads below 50bp insert size and try to assemble them to followed by sequence comparison to chrM or other organelle DNA to get an idea what it is.

ADD REPLYlink modified 9 months ago • written 9 months ago by ATpoint26k

I realise now that perhaps a peak at 20bp (insert size) corresponds to a fragment of -95bp?

ADD REPLYlink written 9 months ago by a.rex190
1
gravatar for igor
9 months ago by
igor8.9k
United States
igor8.9k wrote:

I think Picard's definition refers to the actual fragment size, including the reads. Check the illustration in this previous discussion: Is PICARD CollectInsertSizeMetrics use soft-clipping information to compute the insert size ?

It's not uncommon to end up with very short fragments in ATAC-seq. Some people discard them. For example, in the ATACseqQC paper, they removed read pairs with mapping template shorter than 38 bp.

ADD COMMENTlink written 9 months ago by igor8.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1641 users visited in the last hour