How to extract coordinates of first exons from .gtf file.
1
1
Entering edit mode
6.4 years ago
wanziyi89 ▴ 60

Hi All,

Supposed I have a .gtf file with the exons of all genes in a given genome. I would like to extract the first exon coordinate of each gene from the .gtf file. How should I get started?

regards,

Ziyi

RNA-Seq gtf exon genome • 4.5k views
3
Entering edit mode
6.4 years ago

As your question is about getting started, I would suggest to look for code snippets or libraries that parse GTF files to get an idea about handling a GTF file.

For example:

http://www-huber.embl.de/users/anders/HTSeq/doc/tour.html#tour

https://github.com/ctokheim/PrimerSeq/blob/master/gtf.py

But a quick and dirty way would be:

curl https://raw.githubusercontent.com/roryk/DEXSeq/master/inst/python_scripts/dexseq_prepare_annotation.py  | python - genes.gtf out.tmp​
grep "exonic_part_number \"001\"" out.tmp | less -S


This gives all the first exonic parts of a gene, assuming a standard gtf file format.

0
Entering edit mode

Thank you! HTseq seems promising and I think I found some leads in the TSS Plot.

0
Entering edit mode

I updated my ans. accept it if it works for you.