Question: How To Find Upstream/Downstream Bound Features With A Chip-Seq Analysis In Galaxy
0
gravatar for lasse.kruse
5.0 years ago by
lasse.kruse0 wrote:

HI!

I have a set of ChIP-seq data, and I would like to find out wheather transcription factor KLF1 binds both upstream (i.e. in the promoter) and downstream of the transcription start site (TSS). For this analysis I have been used the RefSeq annotation and define the upstream and downstream regions of interest to be the 1000 nucleotides upstream and downstream of the TSS, respectively. Furthermore I have been using first Get Flanks tool and then INNERJOIN tool and find that 103 places that overlap with my peaks. But how do I find out which of these 103 bounds upstream and downstream, respectively?

bioinformatics chip-seq • 3.7k views
ADD COMMENTlink modified 4.9 years ago by Alex Reynolds27k • written 5.0 years ago by lasse.kruse0
2
gravatar for Alex Reynolds
4.9 years ago by
Alex Reynolds27k
Seattle, WA USA
Alex Reynolds27k wrote:

If you can wrap these command-line operations into Galaxy, then BEDOPS can help.

First, sort your input BED files. The sorted TSSs.bed file contains your TSSs and the sorted TFs.bed contains all the sites for transcription factors:

$ sort-bed TSSs.unsorted.bed > TSSs.bed
$ sort-bed TFs.unsorted.bed > TFs.bed

​To find 1 kb upstream hits:

$ bedops --range -1000:0 --everything TSSs.bed \
​    | bedmap --echo --echo-map --delim '\t - TFs.bed \
    | grep -w "KLF1" - \
    | bedops --range 1000:0 - \
    > TSSsContainingUpstreamKLF1hits.bed

To find 1 kb downstream hits:

$ bedops --range 0:1000 --everything TSSs.bed \
​    | bedmap --echo --echo-map --delim '\t' - TFs.bed \
    | grep -w "KLF1" - \
    | bedops --range 0:-1000 - \
    > TSSsContainingDownstreamKLF1hits.bed

The last column of both output BED files contains a semi-colon delimited list of any KLF1 hits upstream and downstream of each qualifying TSS.

If you want to have all the hits in one output file:

$ bedops --range -1000:1000 --everything TSSs.bed \
​    | bedmap --echo --echo-map --delim '\t' - TFs.bed \
    | grep -w "KLF1" - \
    | bedops --range 1000:-1000 - \
    > TSSsContainingAllKLF1hits.bed
ADD COMMENTlink modified 4.9 years ago • written 4.9 years ago by Alex Reynolds27k
0
gravatar for bede.portz
5.0 years ago by
bede.portz470
United States
bede.portz470 wrote:

I don't use galaxy for this type of analysis, but one workaround could be to separately map your KLF1 ChIP-seq reads to a window 1kb upstream from the Ref-Seq TSS, and also 1kb downstream of the TSS. This will result in two files, each with a Gene ID/TSS list, which you can join using galaxy to find those TSS bound both upstream and downstream by joining on the column containing the gene ID. Bear in mind that mammalian genes can have multiple TSS per gene, and not all of these TSS may actually be utilized in a given cell type, under certain conditions,etc. Thus, you may identify a peak of KLF1 as existing downstream of an annotated TSS, but in actuality it may be upstream of the actual utilized TSS in your cells, or vice versa. If you haven't already considered this, it may be worthwhile to refine the RefSeq TSS list to include only those not within X number of base pairs from another RefSeq TSS, and/or to remove those genes with multiple TSS. I can say from experience that this filtering can dramatically reduce the RefSeq TSS list by many thousands of TSS, and in doing so may alter your results with respect to what genes are bound by KLF1 both upstream and downstream of the TSS.

ADD COMMENTlink written 5.0 years ago by bede.portz470
0
gravatar for Ming Tang
4.9 years ago by
Ming Tang2.4k
Houston/MD Anderson Cancer Center
Ming Tang2.4k wrote:

basically, you want to annotate the peaks, you can use Cistrome built into Galaxy for this http://cistrome.org/ap/

ADD COMMENTlink written 4.9 years ago by Ming Tang2.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 829 users visited in the last hour