Question

Extracting Features That Appear In TSS Regions

0

Entering edit mode

6.8 years ago

armonazizi • 0

Hi,

I'm working with ATAC-Seq data and I need to extract the features in my bed files that are located in transcription start site regions of the mm10 genome. Can anyone recommend a way to do this?

I was thinking of generating a bed file of only tss regions from the mm10 genome and finding the intersect between the TSS file and the sample file. However, I'm not sure how to generate a bed file that only contains TSS sites.

Any help would be appreciated.

Thanks

ChIP-Seq ATAC-Seq • 3.4k views

ADD COMMENT • link updated 6.8 years ago by ccagg ▴ 60 • written 6.8 years ago by armonazizi • 0

score 1 · Answer 1 · 2017-06-22

1

Entering edit mode

6.8 years ago

novice ★ 1.1k

Hi armonazizi,

From my experience, TSS are not explicitly annotated. In fact, they are usually impossible to identify, but I don't know about mm10. Take a look at the annotation file, and decide what feature you want to select. Let's say the features you want have the 3rd column saying 'transcript' (in .gff format). You can extract them into a sorted BED file like so: $ cat mm10.gff | grep -v '^#' | awk '$3=="transcript"' | cut -f1,4,5 | sort -k1,1 -k2,2n -k3,3n > transcripts.bed

From there, I would recommend using BEDtools intersect for your purpose.

ADD COMMENT • link 6.8 years ago by novice ★ 1.1k

0

Entering edit mode

Thanks, I'll give this a try.

ADD REPLY • link 6.8 years ago by armonazizi • 0

score 1 · Answer 2 · 2017-06-22

I typically used homer annotatePeaks.pl to find my TSS when I was working with Atac-seq data

http://homer.ucsd.edu/homer/ngs/annotation.html

this gives you a good estimate of the distance of the peak to a TSS and the output is an excel table that can be easily made into a bedfile.

Hope this help! ATAC-seq was a tricky point in my bioinformatics career!