Extracting Features That Appear In TSS Regions
2
0
Entering edit mode
6.8 years ago
armonazizi • 0

Hi,

I'm working with ATAC-Seq data and I need to extract the features in my bed files that are located in transcription start site regions of the mm10 genome. Can anyone recommend a way to do this?

I was thinking of generating a bed file of only tss regions from the mm10 genome and finding the intersect between the TSS file and the sample file. However, I'm not sure how to generate a bed file that only contains TSS sites.

Any help would be appreciated.

Thanks

ChIP-Seq ATAC-Seq • 3.4k views
ADD COMMENT
1
Entering edit mode
6.8 years ago
novice ★ 1.1k

Hi armonazizi,

From my experience, TSS are not explicitly annotated. In fact, they are usually impossible to identify, but I don't know about mm10. Take a look at the annotation file, and decide what feature you want to select. Let's say the features you want have the 3rd column saying 'transcript' (in .gff format). You can extract them into a sorted BED file like so: $ cat mm10.gff | grep -v '^#' | awk '$3=="transcript"' | cut -f1,4,5 | sort -k1,1 -k2,2n -k3,3n > transcripts.bed

From there, I would recommend using BEDtools intersect for your purpose.

ADD COMMENT
0
Entering edit mode

Thanks, I'll give this a try.

ADD REPLY
1
Entering edit mode
6.8 years ago
ccagg ▴ 60

I typically used homer annotatePeaks.pl to find my TSS when I was working with Atac-seq data

http://homer.ucsd.edu/homer/ngs/annotation.html

this gives you a good estimate of the distance of the peak to a TSS and the output is an excel table that can be easily made into a bedfile.

Hope this help! ATAC-seq was a tricky point in my bioinformatics career!

ADD COMMENT

Login before adding your answer.

Traffic: 2640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6