Question: TSS file for D.melanogaster
6.5 years ago by
United States
I have chip-seq data, and I want to exclude the regions near TSS. Can anyone tell me how to get TSS file? I went to UCSC but didn't find it.

Thanks a lot for any advise in advance.

tss drosophila • 3.0k views
6.5 years ago by
University of Manchester, UK
Within UCSC you can get the data you want. 

First make sure you are currently viewing the right genome, e.g. DM3.

Select 'Tools' (along the top of the screen) > 'Table Browser' to access the tables of data used by UCSC.

Choose: 'group' = 'Genes and Gene Predictions', 'track' (depending on you preference) = 'RefSeq Genes' or 'FlyBase Genes'.

If you select 'output format' = 'BED' when you press 'get output' you will be given the option to 'Create one BED record per' > 'Upstream by N bases'

The resulting output file (to screen if you did not give a file name in the previous screen) will contain the coordinates of the promoter region for your analysis.  Bear in mind that the coordinates are for transcripts (i.e. more than one transcript per gene).


Hope this helps.

Thanks a lot, I didn't look up the table browser at the first place! Your answer is very specific and helpful!Thank you!

4.6 years ago by
United States
For anyone that might still find this, the proposed solutions to use an SQL query at UCSC will not give you an accurate number of TSS's. UCSC's annotated TSS data only has about 6100 TSS's, which is way less than the number of known TSS's. I haven't found a more complete solution but I'll update when I do.

6.5 years ago by
Alex Reynolds31k
Seattle, WA USA
You can do a MySQL query of the UCSC Genome Browser, to output a sorted six-column BED file containing unique RefSeq records:

$ mysql -h -u genome -D dm3 -N -A -e 'select chrom, txStart, txEnd, name2, score, strand from refGene' \
    | sort-bed - \
    | awk 'elements[$0]++ == 1' - \
    > refseq_tss.bed​

Once you have both the RefSeq TSSs and your ChIP-seq data in sorted BED format, you can use bedops --range --not-element-of on these two datasets to exclude any ChIP-seq peaks that fall in a window around each TSS.

See the following docs for more information on these and other bedops operations. Also, the table schema for Drosophila RefSeq is available here, so you can see where those field names come from and what they map to.

Thank you and I got it!

