Question: Extract gene coordinates from GTF
1
gravatar for int11ap1
5.5 years ago by
int11ap1380
Barcelona
int11ap1380 wrote:

Hello,

I have a GTF file with only exon features. There's a way to extract the gene coordinates? Or should I write a script?

- INPUT: GTF file.

- OUTPUT: the gene coordinates, whatever the format is.

Thanks. 

coordinates gtf • 7.2k views
ADD COMMENTlink modified 4.8 years ago by Alex Reynolds29k • written 5.5 years ago by int11ap1380

Please make it more clear by showing your Input file and desired output

ADD REPLYlink written 5.5 years ago by ancient_learner620

Done, I cannot be more clear.

ADD REPLYlink written 5.5 years ago by int11ap1380
5
gravatar for Devon Ryan
5.5 years ago by
Devon Ryan93k
Freiburg, Germany
Devon Ryan93k wrote:

The question becomes exactly what you want in terms of coordinates for a gene. I'm guessing that you just want the 5' most and 3' most position along with the strand an chromosome, but perhaps you have something else in mind.
 

Presuming you do want what I mentioned, you could easily do this in R with GenomicFeatures.

library(GenomicFeatures)
txdb <- makeTranscriptDbFromGFF("some_file.gtf", format="gtf")
genes <- genes(txdb)
write.table(as.data.frame(genes)[,-4], file="Just_genes.txt", colnames=F, sep="\t")

The -4 just removes the width column.

ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by Devon Ryan93k
1

Years later, I would like to make a little update, to maybe save someone 2 minutes: Since some of the last updates the function makeTranscriptDbFromGFF of the package GenomicFeatures is now called makeTxDbFromGFF.

ADD REPLYlink written 13 months ago by caggtaagtat900

I'm getting this error

Error in write.table(as.data.frame(genes)[, -4], file = "Just_genes.txt",  :

unused argument (colnames = F)

ADD REPLYlink written 2.6 years ago by krushnach80630

Try again with

write.table(as.data.frame(genes)[,-4], file="Just_genes.txt", col.names=F, sep="\t")
ADD REPLYlink written 2.6 years ago by thomas musielak0

Hi Devon, How would you get the gene description along with the gene coordinates using a similar script as you presented. Thanks

Dave

ADD REPLYlink written 5 months ago by dklinkebiel0
5
gravatar for Pierre Lindenbaum
5.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum124k wrote:

using awk and sqlite:

curl -sL "https://rseqflow.googlecode.com/files/mouse_refseq_anno.gtf"   |\
awk -F '    ' 'BEGIN {printf("create temp table T(chrom,start,end,gene); begin transaction;\n");} $3=="exon" {n=split($9,a,/[ ;]+/);for(i=1;i+1< n;i++) if(a[i]=="gene_id") printf("insert into T(chrom,start,end,gene) values (\"%s\",%s,%s,%s);\n",$1,$4,$5,a[i+1]);} END {printf("commit; select chrom,gene,min(start),max(end) from T group by chrom,gene;\n");}' |\
sqlite3 tmp.db
(...)
chrY|Rbm31y|12688110|17402718
chrY|Rbmy1a1|2830680|3783271
chrY|Sly|55213720|75222053
ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by Pierre Lindenbaum124k
4
gravatar for Alex Reynolds
4.8 years ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

Using gtf2bed:

$ gtf2bed < foo.gtf | cut -f1-3 > foo_coords.bed3

If you want strand information:

$ gtf2bed < foo.gtf | cut -f1-6 > foo_coords.bed6
ADD COMMENTlink modified 7 weeks ago by RamRS25k • written 4.8 years ago by Alex Reynolds29k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 720 users visited in the last hour