Question: GFF file multiple features for 1 gene region, how to collapse into 1?
gravatar for YOUSEUFS
22 months ago by
YOUSEUFS10 wrote:

Hello, Noob here

My GFF3 file (Converted into BED) contains multiple lines that describe the same gene region but with varying feature ID's (Below)

NC_002978.6     3027    3115    gene2   .       +       RefSeq  gene    .       ID=gene2;Dbxref=GeneID:29555340;Name=WD_RS00025;gbkey=Gene;gene_biotype=tRNA;locus_tag=WD_RS00025;old_locus_tag=tRNA-Leu-1

NC_002978.6     3027    3115    id1     .       +       tRNAscan-SE     exon    .       ID=id1;Parent=rna0;Dbxref=GeneID:29555340;anticodon=(pos:3062..3064);gbkey=tRNA;inference=COORDINATES: profile:tRNAscan-SE:1.23;pr

NC_002978.6     3027    3115    rna0    .       +       tRNAscan-SE     tRNA    .       ID=rna0;Parent=gene2;Dbxref=GeneID:29555340;anticodon=(pos:3062..3064);gbkey=tRNA;inference=COORDINATES: profile:tRNAscan-SE:1.23;

How would I collapse these to give me a single gene region associated with a single feature?

Context: This would then be fed into "bedtools closest" so I can match transcriptional start sites to their closest annotated gene

P.s apologies in advance for any incorrect formatting

rna-seq gff bedtools • 833 views
ADD COMMENTlink modified 22 months ago by Carambakaracho2.2k • written 22 months ago by YOUSEUFS10

Hi Noob,

Your file somewhat resembles a BED, but it's quite confusing. Anyway, start with this to filter for only gene features:

awk '$8 == "gene"' your_file.bed > your_file.genes.bed

Now you should only have genes, which may still overlap, but will be unique genes.

ADD REPLYlink written 22 months ago by goodez480

Thank you very much!

ADD REPLYlink written 21 months ago by YOUSEUFS10
gravatar for Carambakaracho
22 months ago by
Carambakaracho2.2k wrote:

Filter for the gene features, either your gff column 3 or your bed file. However, a tRNA feature might rather be an exception than the rule. You can do this even with excel.

ADD COMMENTlink written 22 months ago by Carambakaracho2.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 688 users visited in the last hour