Question: TSS file for D.melanogaster
gravatar for catherine
6.5 years ago by
United States
catherine170 wrote:

I have chip-seq data, and I want to exclude the regions near TSS. Can anyone tell me how to get TSS file? I went to UCSC but didn't find it.

Thanks a lot for any advise in advance.

tss drosophila • 3.0k views
ADD COMMENTlink modified 4.6 years ago by spencernystrom10 • written 6.5 years ago by catherine170
gravatar for Ian
6.5 years ago by
University of Manchester, UK
Ian5.7k wrote:

Within UCSC you can get the data you want. 

First make sure you are currently viewing the right genome, e.g. DM3.

Select 'Tools' (along the top of the screen) > 'Table Browser' to access the tables of data used by UCSC.

Choose: 'group' = 'Genes and Gene Predictions', 'track' (depending on you preference) = 'RefSeq Genes' or 'FlyBase Genes'.

If you select 'output format' = 'BED' when you press 'get output' you will be given the option to 'Create one BED record per' > 'Upstream by N bases'

The resulting output file (to screen if you did not give a file name in the previous screen) will contain the coordinates of the promoter region for your analysis.  Bear in mind that the coordinates are for transcripts (i.e. more than one transcript per gene).


Hope this helps.

ADD COMMENTlink written 6.5 years ago by Ian5.7k

Thanks a lot, I didn't look up the table browser at the first place! Your answer is very specific and helpful!Thank you!

ADD REPLYlink written 6.5 years ago by catherine170
gravatar for spencernystrom
4.6 years ago by
United States
spencernystrom10 wrote:

For anyone that might still find this, the proposed solutions to use an SQL query at UCSC will not give you an accurate number of TSS's. UCSC's annotated TSS data only has about 6100 TSS's, which is way less than the number of known TSS's. I haven't found a more complete solution but I'll update when I do.

ADD COMMENTlink written 4.6 years ago by spencernystrom10
gravatar for Alex Reynolds
6.5 years ago by
Alex Reynolds31k
Seattle, WA USA
Alex Reynolds31k wrote:

You can do a MySQL query of the UCSC Genome Browser, to output a sorted six-column BED file containing unique RefSeq records:

$ mysql -h -u genome -D dm3 -N -A -e 'select chrom, txStart, txEnd, name2, score, strand from refGene' \
    | sort-bed - \
    | awk 'elements[$0]++ == 1' - \
    > refseq_tss.bed​

Once you have both the RefSeq TSSs and your ChIP-seq data in sorted BED format, you can use bedops --range --not-element-of on these two datasets to exclude any ChIP-seq peaks that fall in a window around each TSS.

See the following docs for more information on these and other bedops operations. Also, the table schema for Drosophila RefSeq is available here, so you can see where those field names come from and what they map to.

ADD COMMENTlink modified 6.5 years ago • written 6.5 years ago by Alex Reynolds31k

Thank you and I got it!

ADD REPLYlink written 6.5 years ago by catherine170
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1077 users visited in the last hour