Calculating fold change at enhancers and promoters
3
0
Entering edit mode
8.0 years ago
Sinji ★ 3.2k

I have a question about fold change and how it should be calculated for enhancer regions. I am aware that packages such as DESeq2 and edgeR will calculate the fold change for genes, but I am interested in calculating fold change at enhancers.

I am quite naive and new to RNA-seq data, and it's possible that something like this is not entirely feasible, but i'd rather ask the experts.

So my questions can be summed up as the following:

  1. Is it possible to calculate the fold change of 'expression' at enhancer regions? Is this something that can be feasibly done with RNA-seq data?

  2. Is the information in this thread still relevant and if so could I implement this idea of log2 coverage / read counts at enhancer regions to compare against enhancer associated promoter fold change?

I am trying to use this information to answer whether there is any correlation between eRNA expression at enhancers and gene expression at promoters. One thought is that if I have high eRNA expression, I would see high gene expression in the 'associated' gene.

I appreciate all the help! I'm looking to learn, so any information regarding anything in this post is appropriate.

RNA-Seq fold change • 3.4k views
ADD COMMENT
1
Entering edit mode
8.0 years ago
Bogdan ★ 1.4k

you can use BEDTOOLS as well to compute the counts of eRNAs on enhancers : http://bedtools.readthedocs.org/en/latest/ particularly there is the intersectBed function (-c option) for counting the reads on enhancers. you can do the same for the gene bodies. and yes, the analysis is simple, no need to make it more complicated ;)

ADD COMMENT
2
Entering edit mode
8.0 years ago
Bogdan ★ 1.4k

hey Sinji, depending on the depth of your sequencing data you may see small fold changes in eRNA expression. are you using GRO-seq or RNA-seq ? typically what people do is to consider a "population" (set) of promoters and enhancers and to compute the FC globally. you can represent the data as boxplots with approapriate randomization controls.

ADD COMMENT
0
Entering edit mode

I have access to both GRO-seq and RNA-seq datasets. My RNA-seq datasets are three replicates of WT cells and 3 replicates of KO cells where my TF (identified to be found both at promoter regions and my enhancer regions of interest) has been knocked out.

Could you explain the concept of computing FC globally and randomization controls a bit more? I'm going through some google searches now, but any links or explanation would be appreciated.

ADD REPLY
0
Entering edit mode

some colleagues in http://www.ncbi.nlm.nih.gov/pubmed/23728302 did the analysis in the following way :

-- took a set of 1000 enhancers bound by estrogen receptor -- computed for each enhancer the counts of eRNAs (based on GRO-seq reads) -- let's say : enhancer 1 has 10 reads in -E2, and 20 reads in +E2 .... enhancer n has 15 reads in -E2, and 25 reads in +E2 -- then you can represent as boxplots, the number of reads on ALL 1000 enhancers in -E2, and in +E2 : i.e. a boxplot will have the values 10, 15 ...in -E2 and another boxplot will have the values of 20, 25 in +E2. -- you can do a t-test between the boxplot in -E2 and boxplot in +E2.

ADD REPLY
0
Entering edit mode

This paper combined with your other answer have solved the majority of my problem. I just need to learn how to calculate log2FC and I should be good!

ADD REPLY
1
Entering edit mode
8.0 years ago
Bogdan ★ 1.4k

in your case, it will be a boxplot for WT and a boxplot for KO.

now you have 3 replicates, although i think that the depth will be too low to use edgeR to call too many differential expressed eRNA.

another thing you could do would be to bi-plot :

-- log2Fc for the gene (WT/KO) -- log2FC for the closest eRNA (WT/KO)

compute the correlation coefficient. that is probably all you need to show the correlation of eRNA and gene changes.

ADD COMMENT
0
Entering edit mode

The second case you described (the bi-plot) was exactly what I was thinking of attempting. My problem seems to be that I am unsure how to calculate log2FC. Do I count read coverage (using something like deeptools computeMatrix) for both eRNAs and associated genes and then take the log2 of each of these individually and then plot them using seaborn or R where the x axis is the log2FC of gene (wt/ko) and the y axis is the log2FC of the closest eRNA (wt/ko)? This makes sense to me, but it sounds too simple.

This answers part of my question though, so thank you!

ADD REPLY

Login before adding your answer.

Traffic: 1590 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6