Question: Extracting features from .gff3 file based on position
gravatar for adamerum
7 months ago by
adamerum0 wrote:

Hi everyone,

Can anyone point me in the direction of how I can extract all features within a given genomic range from a .gff3 file? For example, if I have a list of genomic regions I am interested in, with scaffold ID and start to end position - how can I extract the gene IDs from the gff?

I am extremely new to bioinformatics so please excuse me if this is super straightforward or has been answered elsewhere...

I have found posts where you can use Bioconductor packages to extract features based on ID (i.e. searching for a specific gene) but can't see how to extract all genes in a given range.

The number of tools out there are overwhelming and I am pretty sure this is the kind of straightforward thing someone could easily answer!

Thanks in advance.

gff3 gene • 350 views
ADD COMMENTlink modified 7 months ago by _r_am32k • written 7 months ago by adamerum0

You can filter your gff to keep only the region of interest as indicated here.

Then 2 possibilities using AGAT:

  1. You could use something like

    And then extract what you need from the tsv:

    awk '{if( $3=="gene") print $10}' your_file.tsv

    where $10 is the 10th column (considering the ID is in the 10th column, check the first line to see in which column the |D is)

  2. Use --gff input.gff -t gene --attribute ID
ADD REPLYlink modified 7 months ago by _r_am32k • written 7 months ago by Juke345.0k

Thank you both for your replies - great to be made aware of bedtools and AGAT which both look very helpful!

ADD REPLYlink written 7 months ago by adamerum0

You should be able to use bedtools - it works on GFF files. The "list of genomic regions" should be a BED file (0-based).

ADD REPLYlink written 7 months ago by _r_am32k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2261 users visited in the last hour