Question

ActiveEnhancers and Gene expression

0

Entering edit mode

6.4 years ago

carlosalfonsogonzalez6 ▴ 10

Hello

I am trying to understand the impact of an specific active enhancer region to nearby genes, what im trying to have it to obtain is which nearby genes to the enhancer region are down regulated or upregulated in control vs experimental condition. Do you have some advice or a library on which i could integrate both RNA-seq and Chip-Seq data and identify a posible association between TF binding and expression of nearby genes.

Thanks!

ChIP-Seq RNA-Seq • 1.9k views

ADD COMMENT • link updated 6.4 years ago by jared.andrews07 ★ 16k • written 6.4 years ago by carlosalfonsogonzalez6 ▴ 10

1

Entering edit mode

Here are some steps to follow

Assign each enhancer to a gene (This can be done by two ways, one is distance based, take nearest 2 genes up & downstream or assign all genes to an enhancer if found in a TAD region, check this paper (methods section))
Perform a correlation test between normalized signal of your chip-seq and normalized gene expression.
Be little flexible with correlation cut-offs.

Recently, I was doing this kind of analysis, let me know if you need more details.

ADD REPLY • link 6.4 years ago by venu 7.1k

0

Entering edit mode

I would be very carefull with this. When analysing different types of chromatin interaction data just talking the nearest gene is only slightly better than randomly selecting genes (AUROC values in the 0.5-0.6 range) - see fx fig 3b-e in this recent article.

ADD REPLY • link 6.4 years ago by Kristoffer Vitting-Seerup ★ 4.0k

0

Entering edit mode

I have data from same stage and same lab couple in time RNA and Chip experiments, What do you think? Im building a core for machine learning

ADD REPLY • link 6.4 years ago by carlosalfonsogonzalez6 ▴ 10

0

Entering edit mode

Thanks a lot!! Can you recomend some package or library to work on that, preferably on R?

ADD REPLY • link 6.4 years ago by carlosalfonsogonzalez6 ▴ 10

0

Entering edit mode

To help you out I need to know which cell type and organisme are you working with?

ADD REPLY • link 6.4 years ago by Kristoffer Vitting-Seerup ★ 4.0k

0

Entering edit mode

Im working with Drosophila in blastoderm stage of four TF and RNA seq from same sample same organism.

ADD REPLY • link 6.4 years ago by carlosalfonsogonzalez6 ▴ 10

score 3 · Answer 1 · 2017-11-17

There are many different methods/packages for doing things like this out there. A little searching will yield many a blog post, Biostars questions, and publications. My usual workflow for this usually goes something like this:

1.) Identify differentially bound beaks.

Assuming you've already called your peaks (with MACS, HOMER, spp, etc), this can be done with software like DiffBind (if you have replicates/many samples) or MAnorm (single samples). They'll derive a consensus peakset and compare it across your control vs treatment conditions. They're also pretty easy to setup and use. This will yield a set of differentially bound peaks.

2.) Identify differentially expressed genes.

It seems most people have tried to move away from alignment-dependent RNA quantification tools (cufflinks2, etc) lately towards inference-based estimation methods (salmon, kallisto) followed by a typical differential gene expression package like DESeq2, edgeR, or limma. These have the advantage of also being much quicker. This will yield a list of differentially expressed genes, which you can then filter/rank by magnitude/p-value.

3.) Identify differentially bound peaks that correlate with differentially expressed genes.

This becomes a little trickier as we don't know your experimental setup, what TF you're ChIPing, or what your control and experimental conditions are. Regardless, I usually take a simple approach first, just looking at the closest differentially expressed genes to my peaks with bedtools' closest or BEDOPS closest-features. This will give you a list with the closest gene to each peak, though it's important to remember that the target gene of a given regulatory element may be up to 1000kb away.

4.) Visualize groups and pathway analyses Once I have these lists, I try to visualize them across all of my samples to pick out sites/genes that are robust and recurrent. I've grown found of EaSeq for visualizing signal at peaksets quickly in a variety of ways - heatmaps, genome-wide signal profiles, and more. It's also good for looking at individual loci/genes if you're interested in specific examples.

I also usually run my peaks through GREAT, which performs pathway and GO enrichment analyses or genes near your peaks. At a minimum, it usually helps you determine if your results make sense biologically.

This is a rather generic and vague guide, but hopefully it helps you get started.