Question: motif search with ATACseq
0
gravatar for grant.hovhannisyan
9 weeks ago by
grant.hovhannisyan1.3k wrote:

I have ATACseq data for two yeast species. I have called peaks with MACS2 and did occupancy and affinity analysis of peaks with DiffBind. Now I need to find motifs of TF binding sites in the peaks and compare those motifs between two species.

There is a ton of softs and databases for doing the task, so I am a bit confused on how to start with the analysis. Can anybody share the experience with motif search and motif comparisons, specifically which tools are considered as "best practices" in the field?

Thanks

motif search atacseq • 481 views
ADD COMMENTlink modified 8 weeks ago by afli140 • written 9 weeks ago by grant.hovhannisyan1.3k

What exactly are you confused about? You imply you've already done some research on how to do this, so it's difficult to figure out how we could help you. Have you tried any of the "tons of softs and databases" and found them lacking?

ADD REPLYlink written 9 weeks ago by Friederike2.3k
3
gravatar for Devon Ryan
9 weeks ago by
Devon Ryan86k
Freiburg, Germany
Devon Ryan86k wrote:

There are basically 2 genres of tools that one can use: those that search for motifs in peaks and those that do footprinting. The former group are well represented by homer and the meme suite (MEME-ChIP in particular). The latter group is mostly represented by wellington. I'm generally not a fan of footprinting with ATAC-seq data, the coverage needed to do it properly is just absurdly high. Given that, one of homer/meme/etc. would be my preference. I generally find homer to be annoying to use, so I personally prefer MEME, but that's more of a personal preference than a best practice.

ADD COMMENTlink written 9 weeks ago by Devon Ryan86k

Thanks for input Devon! Regarding footprinting, do you think its fine to merge the ATACseq replicates to increase the depth (replicates are very reproducible)? Regarding the motifs, say I have found the motifs in the peaks, what kind of comparison between motifs of different species do you think can be done? Sequence comparison, copy number comparison, others? I am new to this field and want to conceptually understand what makes sense and what does not. Thank you

ADD REPLYlink written 9 weeks ago by grant.hovhannisyan1.3k

Sure, you can merge replicates. Regarding the comparisons that make sense, that depends on the biological question. Note that comparing across species is rife with issues.

ADD REPLYlink written 9 weeks ago by Devon Ryan86k
1

I apologize for interrupting another tread, but I would usually encourage the use of replicates.

While Devon is right that the number of reads per ATAC-Seq sample can sometimes be high, if you already have replicates with high-coverage samples, I think you should take advantage of that.

Also, with yeast, getting high coverage should be less of an issue than an organism with a larger genome, such as human or mouse (I am assuming you are studying a species of yeast with limited introns, but I admittedly don't know how the largest yeast genome compares to a vertebrate genome).

ADD REPLYlink written 9 weeks ago by Charles Warden5.8k

Hi Charles, the genome size is ca 12Mb and the ATACseq reps have around 20mln reads each (the coverage is much higher compared to human genome, though I don't really get what coverage in ATACseq context means). Regarding the replicates, for example I have used them in occupancy analysis with DiffBind, and most of the peaks are shared between the replicates. So I can try maybe merging the reps and focus only on those common peaks.

ADD REPLYlink written 9 weeks ago by grant.hovhannisyan1.3k

It's not uncommon to try removing duplicate reads before peak calling with ATAC-Seq. If you do this, the unique read coverage can be considerably lower than the original coverage (which is what I think Devon was talking about in the original comment).

However, if you are getting reasonable results with your strategy using replicates with your current strategy, I think that is OK (and arguably what matters most - as long as you have some way to biologically assess your results). Knowing about possible strategies for troubleshooting (such as removing or keeping duplicate reads, using counts for reads from programs like htseq-count/featureCounts for DESeq2/edgeR/limma-voom, etc.), is probably not a bad idea (and should allow you to be more comfortable when responding to reviewers). However, you may find that some strategies work better for your particular dataset than others; what works best for your data may not be 100% identical to what is most popular (if you are able to define that), but having some novelty in your analysis strategy should also likely add significance your paper for higher-impact publications :)

ADD REPLYlink written 9 weeks ago by Charles Warden5.8k

Note that comparing across species is rife with issues.

Yes, especially when they have 30% of genome divergence:)

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by grant.hovhannisyan1.3k
3
gravatar for ATpoint
8 weeks ago by
ATpoint11k
Germany
ATpoint11k wrote:

Depending on what your exact question is, you might consider chromVAR. It takes as input a set of peaks, e.g. the combined peak sets of your two species, and the aligned BAM files to infer differential motif accessability. In the end, you'll get a list of motifs that are differentially accessable in either condition. chromVAR does that by computing a variability score for each motif. For this, it first matches a set of motifs, e.g. from JASPAR to the peaks and then checks if regions with a certain motif are more or less accessible in condition1 vs. condition2. Even though it may primarily been developed for single-cell ATAC-seq, I had some good success so far with it on bulk ATAC-seq data, producing results that made biological sense and were supported by other experiments.

ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by ATpoint11k

Thanks, the software looks promising.

ADD REPLYlink written 8 weeks ago by grant.hovhannisyan1.3k

Hi ATpoin, peaks from ATAC-seq data or DNase-seq data may contain several footprint site, each with a potential motif. If we use the sequence centered in summit of the peak, will this hamper the discovery of each motif in MEME-chip analysis? At least centrimo may be affected. Or should we first characterise each footprint, and then retrieve ~300bp around the each footprint center and then do the MEME analysis?

Thank you!

Aifu.

ADD REPLYlink written 6 weeks ago by afli140

When analyse bulk ATAC-seq data(like two different tissues), any patameters to pay attention to? Thank you.

ADD REPLYlink written 6 weeks ago by afli140

Sorry for the late reply. You can have a look at my basic script for pairwise comparisons at Github.

ADD REPLYlink written 16 days ago by ATpoint11k
1
gravatar for Charles Warden
9 weeks ago by
Charles Warden5.8k
Duarte, CA
Charles Warden5.8k wrote:

You might want to take a look at this post: A: How can I find motifs under individual ATAC-peaks?

However, i-cisTarget doesn't have yeast annotations (at least as far as I can tell).

For general peak enrichment, I think the species is also a limitation for Broad-Enrich or GREAT, but perhaps you can look at citations for papers (or the papers themselves) to get some other ideas (but maybe that is a little off target from your motif question).

ADD COMMENTlink written 9 weeks ago by Charles Warden5.8k

Thank you Charles, very useful! One of the species I analyze is non-model, so I guess indeed there will be limitations. So either I will need to do motif discovery, or search based S. cerevisiae motifs.

ADD REPLYlink written 9 weeks ago by grant.hovhannisyan1.3k
1
gravatar for afli
8 weeks ago by
afli140
China, Beijing, IGDB
afli140 wrote:

Hi, you can try HINT(http://www.regulatory-genomics.org/hint/tutorial/). It can do the comparation, and works well for me.

ADD COMMENTlink written 8 weeks ago by afli140

Thanks for contributing!

ADD REPLYlink written 8 weeks ago by grant.hovhannisyan1.3k

Unfortunately HINT works only for some vertebrates.

ADD REPLYlink written 7 weeks ago by grant.hovhannisyan1.3k
2

You can download JASPAR motif, and do some modification, then you can do the analysis for other species, see this link https://groups.google.com/forum/#!category-topic/rgtusers/general-discussion--rgt-core-classes/6ioEaNXEeeA.

Hope this help!

Aifu.

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by afli140

Thanks for pointing to this!

ADD REPLYlink written 7 weeks ago by grant.hovhannisyan1.3k

Hi Aifu, I am wondering which mapping software have you used for your data?

ADD REPLYlink written 16 days ago by grant.hovhannisyan1.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1752 users visited in the last hour