Question

Genes with promoter and enhancer regions as GTF

1

Entering edit mode

5.9 years ago

hkarakurt ▴ 180

Hello, I am doing a ChIP-Seq analysis and I have sorted SAM files (I used MACS). I used featureCounts and GTF file from USCS Table Browser. I have chosen "Genes and Gene Predictions" as group, "UCSC Genes" as track and "knownGene" as table. Now I have Counts matrix for my ChIP-Seq experiments and I used DESeq2. This method is used as I know from posts from Bioconductor forums.

But the problem is, my GTF file have CDS, mRNA, start and stop codons. As I know promoter regions are +1000 bp up from gene start mostly and I am not sure my GTF file have this parts or not. I also have no idea about enhancer regions.

How can I download a GTF file which also have promoter and enhancer regions. Please I am in hurry and need help.

Thank you.

ChIP-Seq featurecounts gtf promoter genes • 4.4k views

ADD COMMENT • link updated 6 months ago by abis.1819104 • 0 • written 5.9 years ago by hkarakurt ▴ 180

score 3 · Answer 1 · 2018-06-05

Okay, first things first

Please I am in hurry and need help

Slow down and think again what you are doing.

I used MACS

Used MACS for what? (BTW, convert SAM to BAM and save some space. Just a suggestion). I'm sure you wanted to find out ChIP enriched regions, which you can do with MACS.

I used featureCounts and GTF file from USCS Table Browser

To find out what? I assume, what you wanted to do here is counting reads that fall in MACS identified peak regions. You don't need a GTF for this. All you need is BED format of peak regions and bamCoverage from deeptools (there are others as well, e.g. bedtools).

I have Counts matrix for my ChIP-Seq experiments and I used DESeq2

Count matrix for how many samples? And what did you found from DESeq2?

I suppose you want to find out differentially active promoters from the ChIP-seq data(?). In that case download Gencode annotation file and get gene level promoter regions (or transcript level, depending on your requirement) and use those regions to generate a count matrix and apply DESeq2. You will get differentially active promoters for your conditions.

How can I download a GTF file which also have promoter and enhancer regions

If the purpose of this GTF is to count reads falling those regions, you don't need a GTF, simple BED files are enough.

score 2 · Answer 2 · 2018-06-05

First of all: this is not as trivial as you might hope for it to be. Ergo, this issue may warrant some thought and some trial and error on your part. Just because you're in a hurry now shouldn't mean you shouldn't be revisiting the choices you may be making after reading my response.

Enhancers are totally cell (and possibly condition) specific, plus there isn't even a consensus of how to define enhancers in a uniform manner. Thus, I'm not aware of a GTF file from the usual Genome Data Repositories that will contain all enhancers ever defined.

You can browse the data at UCSC Table Browser -- just choose "group: Regulation" instead of "Genes and Gene Predictions". You could then, for example, choose the DNase Track as a proxy for open chromatin (and enhancers tend to be open, so are active promoters). You would have to pay attention to the cell type though as K562 cells might not be what you want.

Promoters are even less well defined, depending on the model organism and the preference of the PI, they may be any region between 200 to 10 000 bp up- and/or downstream of the TSS. Here's a classic biostars post on how you could do that yourself given a BED file.

Edit: Yes to everything Venu wrote!

score 0 · Answer 3 · 2018-06-05

Thank you for answers. I will explain more clearly and tell the reasons. I used MACS for peak calling. I also have BAM files I just forgot to mention that. I used featureCounts to quantify my peaks. I used GTF file as annotation. I have 7 samples for H3K4me1 and 6 samples for H3K27ac ChIP-Seq experiments.

I have done this because I need differentially enriched regions and the names of these regions to compare and integrate (or at least try it) with RNA-Seq data of same experiments. In NCBI GEO page of data set, I saw researchers use featureCounts and EdgeR for differential binding analysis. It is here: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM2889136

Also, https://support.bioconductor.org/p/109154/#109742 is my post and in Bioconductor forums, people said DESeq2 can be used. I just want to compare the peaks between conditions.

I used term "GTF" so many times but I should have say annotation instead of it.

Thank you again.