Question: GFF file multiple features for 1 gene region, how to collapse into 1?
gravatar for YOUSEUFS
6 months ago by
YOUSEUFS10 wrote:

Hello, Noob here

My GFF3 file (Converted into BED) contains multiple lines that describe the same gene region but with varying feature ID's (Below)

NC_002978.6     3027    3115    gene2   .       +       RefSeq  gene    .       ID=gene2;Dbxref=GeneID:29555340;Name=WD_RS00025;gbkey=Gene;gene_biotype=tRNA;locus_tag=WD_RS00025;old_locus_tag=tRNA-Leu-1

NC_002978.6     3027    3115    id1     .       +       tRNAscan-SE     exon    .       ID=id1;Parent=rna0;Dbxref=GeneID:29555340;anticodon=(pos:3062..3064);gbkey=tRNA;inference=COORDINATES: profile:tRNAscan-SE:1.23;pr

NC_002978.6     3027    3115    rna0    .       +       tRNAscan-SE     tRNA    .       ID=rna0;Parent=gene2;Dbxref=GeneID:29555340;anticodon=(pos:3062..3064);gbkey=tRNA;inference=COORDINATES: profile:tRNAscan-SE:1.23;

How would I collapse these to give me a single gene region associated with a single feature?

Context: This would then be fed into "bedtools closest" so I can match transcriptional start sites to their closest annotated gene

P.s apologies in advance for any incorrect formatting

rna-seq gff bedtools • 313 views
ADD COMMENTlink modified 6 months ago by Carambakaracho970 • written 6 months ago by YOUSEUFS10

Hi Noob,

Your file somewhat resembles a BED, but it's quite confusing. Anyway, start with this to filter for only gene features:

awk '$8 == "gene"' your_file.bed > your_file.genes.bed

Now you should only have genes, which may still overlap, but will be unique genes.

ADD REPLYlink written 6 months ago by goodez460

Thank you very much!

ADD REPLYlink written 6 months ago by YOUSEUFS10
gravatar for Carambakaracho
6 months ago by
Carambakaracho970 wrote:

Filter for the gene features, either your gff column 3 or your bed file. However, a tRNA feature might rather be an exception than the rule. You can do this even with excel.

ADD COMMENTlink written 6 months ago by Carambakaracho970
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1860 users visited in the last hour