Question: Extract gene coordinates from GTF
gravatar for int11ap1
5.8 years ago by
int11ap1400 wrote:


I have a GTF file with only exon features. There's a way to extract the gene coordinates? Or should I write a script?

- INPUT: GTF file.

- OUTPUT: the gene coordinates, whatever the format is.


coordinates gtf • 7.7k views
ADD COMMENTlink modified 5.1 years ago by Alex Reynolds29k • written 5.8 years ago by int11ap1400

Please make it more clear by showing your Input file and desired output

ADD REPLYlink written 5.8 years ago by ancient_learner620

Done, I cannot be more clear.

ADD REPLYlink written 5.8 years ago by int11ap1400
gravatar for Devon Ryan
5.8 years ago by
Devon Ryan94k
Freiburg, Germany
Devon Ryan94k wrote:

The question becomes exactly what you want in terms of coordinates for a gene. I'm guessing that you just want the 5' most and 3' most position along with the strand an chromosome, but perhaps you have something else in mind.

Presuming you do want what I mentioned, you could easily do this in R with GenomicFeatures.

txdb <- makeTranscriptDbFromGFF("some_file.gtf", format="gtf")
genes <- genes(txdb)
write.table([,-4], file="Just_genes.txt", colnames=F, sep="\t")

The -4 just removes the width column.

ADD COMMENTlink modified 11 weeks ago by RamRS26k • written 5.8 years ago by Devon Ryan94k

Years later, I would like to make a little update, to maybe save someone 2 minutes: Since some of the last updates the function makeTranscriptDbFromGFF of the package GenomicFeatures is now called makeTxDbFromGFF.

ADD REPLYlink written 17 months ago by caggtaagtat930

I'm getting this error

Error in write.table([, -4], file = "Just_genes.txt",  :

unused argument (colnames = F)

ADD REPLYlink written 2.9 years ago by krushnach80690

Try again with

write.table([,-4], file="Just_genes.txt", col.names=F, sep="\t")
ADD REPLYlink written 2.9 years ago by thomas musielak0

Hi Devon, How would you get the gene description along with the gene coordinates using a similar script as you presented. Thanks


ADD REPLYlink written 9 months ago by dklinkebiel0
gravatar for Pierre Lindenbaum
5.8 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum127k wrote:

using awk and sqlite:

curl -sL ""   |\
awk -F '    ' 'BEGIN {printf("create temp table T(chrom,start,end,gene); begin transaction;\n");} $3=="exon" {n=split($9,a,/[ ;]+/);for(i=1;i+1< n;i++) if(a[i]=="gene_id") printf("insert into T(chrom,start,end,gene) values (\"%s\",%s,%s,%s);\n",$1,$4,$5,a[i+1]);} END {printf("commit; select chrom,gene,min(start),max(end) from T group by chrom,gene;\n");}' |\
sqlite3 tmp.db
ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by Pierre Lindenbaum127k
gravatar for Alex Reynolds
5.1 years ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

Using gtf2bed:

$ gtf2bed < foo.gtf | cut -f1-3 > foo_coords.bed3

If you want strand information:

$ gtf2bed < foo.gtf | cut -f1-6 > foo_coords.bed6
ADD COMMENTlink modified 5 months ago by RamRS26k • written 5.1 years ago by Alex Reynolds29k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1744 users visited in the last hour