How to extract coordinates of first exons from .gtf file.
1
1
Entering edit mode
6.4 years ago
wanziyi89 ▴ 60

Hi All, 

 

Supposed I have a .gtf file with the exons of all genes in a given genome. I would like to extract the first exon coordinate of each gene from the .gtf file. How should I get started?
 

regards,

 

Ziyi

RNA-Seq gtf exon genome • 4.5k views
ADD COMMENT
3
Entering edit mode
6.4 years ago

As your question is about getting started, I would suggest to look for code snippets or libraries that parse GTF files to get an idea about handling a GTF file.

For example:

http://www-huber.embl.de/users/anders/HTSeq/doc/tour.html#tour

https://github.com/ctokheim/PrimerSeq/blob/master/gtf.py

But a quick and dirty way would be:

curl https://raw.githubusercontent.com/roryk/DEXSeq/master/inst/python_scripts/dexseq_prepare_annotation.py  | python - genes.gtf out.tmp​
grep "exonic_part_number \"001\"" out.tmp | less -S

This gives all the first exonic parts of a gene, assuming a standard gtf file format.

ADD COMMENT
0
Entering edit mode

Thank you! HTseq seems promising and I think I found some leads in the TSS Plot.

ADD REPLY
0
Entering edit mode

I updated my ans. accept it if it works for you.

ADD REPLY

Login before adding your answer.

Traffic: 2420 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6