Recommendations on FAIRE-seq/ATAC-seq optimal coverage for large genome
0
0
Entering edit mode
3.4 years ago
nanoide ▴ 120

Hi, we are planning on doing chromatin accessibility profiling by FAIRE-seq or ATAC-seq in a plant model with a rather large genome (4-5Gb). I'm currently looking for recommendation regarding these techniques, and while I find recommended number of reads when sequencing in several places, I'm finding it difficult to find recommendations regarding the coverage. Because if I understood it correctly, the same number of reads for an organism with larger genome would mean less coverage, am I right?

So my question is mainly which is the recommended coverage for these techniques, would 2X be acceptable? 5X? Also any recommendation on the read length provided that we have paired-end? Or whatever length the sequencing facility gives us is going to be ok?

Any opinion or hint to any guidelines would be much appreciated, thanks

ATAC-seq FAIRE-seq Coverage • 885 views
ADD COMMENT
2
Entering edit mode

I do not think it is meaningful to use coverage as a metric in assays that enrich for certain regions and eventually produce highly uneven coverage across both the genome and the peak regions itself.

For mouse and human samples I typically aim for about 30mio raw reads per sample which gives decent results in peak calling (Genrich peak caller) and differential expression. It obviously depends on the data quality, not sure how well adapted the ATAC-seq is to plants these days. In my own data I usually lose like 20% of reads during the filtering steps and from those then about 30-50% do actually contribute to peaks. It is difficult to give advises up front. Be sure to produce sufficient replicates once you established a good protocol, at least three, better more per group. I'd strongly encourage ATAC-seq over FAIRE, data quality is just so notably much better.

As for read length I see little advantage beyond 2x75bp reads as a notable fraction of fragments in ATAC-seq will be short (< 100bp) so longer reads will just pick up adapter content that requires trimming. 2x50 or 2x75 is typically sufficient, but if you get a good quote for longer reads then take it, you can always trim back.

I would see whether you can get published data from similar experiments, see how peak numbers behave if you use the full dataset vs. subsamples, e.g. in steps of 10% of total reads, see whether more reads give much more peaks, some kind of initial saturation analysis may help. There are probably ATAC-seq data for similar types plants out there. In the end you can always resequence your libraries if it turns out that you did not sequence deep enough so it might make sense to just make a sophisticated guess on what you need, then do it, analyse it, and then if necessary sequence it again if you assume that more reads might be beneficial.

ADD REPLY
0
Entering edit mode

Appreciate the comprehensive answer. I've found all of your comments very useful and I think everything seems more clear now. Thanks for your time!

ADD REPLY
1
Entering edit mode

Sure, no problem, feel free to comment if you need clarification for anything.

ADD REPLY

Login before adding your answer.

Traffic: 1720 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6