Question: Extract list of gene coordinates from gff file
0
gravatar for the_cowa
4 weeks ago by
the_cowa40
the_cowa40 wrote:

I have a list of genes and I need coordinates of those genes from the gff file.

I tried with

grep -wFf gene_list sample.gff

but it is taking too much time to respond (size of gff file is 20GB). Is there any other way to extract coordinates ?

awk grep gff python gene • 135 views
ADD COMMENTlink written 4 weeks ago by the_cowa40
1

Try to make your regex as specific as possible. E.g. grep GSBRNA2T00155995001 sample.gtf will be slightly slower than grep 'gene_id \"GSBRNA2T00155995001' sample.gtf. How much improvement you can gain from this depends on the structure of your gtf file.

ADD REPLYlink modified 29 days ago • written 4 weeks ago by flappix30

If @Pierre's answer worked for you in this: Bed file grepping from the list have you tried to use it here? BTW, programs written in python etc are not likely to be faster than a system utility like grep for extracting data.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by genomax85k

I tried with join but that is also too slow

ADD REPLYlink written 4 weeks ago by the_cowa40

Break your gff file in several pieces and then do the search.

ADD REPLYlink written 4 weeks ago by genomax85k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1699 users visited in the last hour