Question: remove UTR annotations from gff
0
gravatar for rob234king
14 months ago by
rob234king600
UK/Harpenden/Rothamsted Research
rob234king600 wrote:

I have UTR annotations which I need to remove for a set of genes, which I could do just by deleting but then it would not update the start and stop positions of the gene and mRNA features, and correct exon features. Is there a tool to remove UTR features from a gff file and make the necessary corrections to the remaining mRNA and gene, and exon features? I can't seem to find something but there must be??

gff • 483 views
ADD COMMENTlink modified 14 months ago by Ace70 • written 14 months ago by rob234king600
2
gravatar for SMK
14 months ago by
SMK1.9k
SMK1.9k wrote:

Hi rob234king,

I think you can try gff3_file_UTR_trimmer.pl from PASApipeline:

(edited: changed the example to the gff3 in PASA directory; still thinking the meaning of a shift of phase, not a copy-paste problem)

$ cat test.gff3
gi|68711    TIGR    gene    12923   14228   .   +   .   ID=68711.t00017;Name=my_protein_name
gi|68711    TIGR    mRNA    12923   14228   .   +   .   ID=model.68711.m00017;Parent=68711.t00017
gi|68711    TIGR    five_prime_utr  12923   13029   .   +   .   ID=utr5p_of_68711.m00017;Parent=68711.m00017
gi|68711    TIGR    exon    12923   13060   .   +   .   ID=68711.e00069;Parent=model.68711.m00017
gi|68711    TIGR    CDS 13030   13060   .   +   0   ID=13030_13060cds_of_68711.m00017;Parent=model.68711.m00017
gi|68711    TIGR    exon    13411   13550   .   +   .   ID=68711.e00070;Parent=model.68711.m00017
gi|68711    TIGR    CDS 13411   13550   .   +   1   ID=13411_13550cds_of_68711.m00017;Parent=model.68711.m00017
gi|68711    TIGR    exon    13677   13802   .   +   .   ID=68711.e00071;Parent=model.68711.m00017
gi|68711    TIGR    CDS 13677   13802   .   +   0   ID=13677_13802cds_of_68711.m00017;Parent=model.68711.m00017
gi|68711    TIGR    exon    13876   14228   .   +   .   ID=68711.e00072;Parent=model.68711.m00017
gi|68711    TIGR    CDS 13876   14016   .   +   0   ID=13876_14016cds_of_68711.m00017;Parent=model.68711.m00017
gi|68711    TIGR    three_prime_utr 14017   14228   .   +   .   ID=utr3p_of_68711.m00017;Parent=68711.m00017

$ perl ~/src/PASApipeline-v2.3.3/misc_utilities/gff3_file_UTR_trimmer.pl test.gff3
gi|68711    TIGR    gene    13030   14016   .   +   .   ID=68711.t00017.1;Name=my_protein_name
gi|68711    TIGR    mRNA    13030   14016   .   +   .   ID=model.68711.m00017;Parent=68711.t00017.1;Name=my_protein_name
gi|68711    TIGR    exon    13030   13060   .   +   .   ID=model.68711.m00017.exon1;Parent=model.68711.m00017
gi|68711    TIGR    CDS 13030   13060   .   +   0   ID=cds.model.68711.m00017;Parent=model.68711.m00017
gi|68711    TIGR    exon    13411   13550   .   +   .   ID=model.68711.m00017.exon2;Parent=model.68711.m00017
gi|68711    TIGR    CDS 13411   13550   .   +   2   ID=cds.model.68711.m00017;Parent=model.68711.m00017
gi|68711    TIGR    exon    13677   13802   .   +   .   ID=model.68711.m00017.exon3;Parent=model.68711.m00017
gi|68711    TIGR    CDS 13677   13802   .   +   0   ID=cds.model.68711.m00017;Parent=model.68711.m00017
gi|68711    TIGR    exon    13876   14016   .   +   .   ID=model.68711.m00017.exon4;Parent=model.68711.m00017
gi|68711    TIGR    CDS 13876   14016   .   +   0   ID=cds.model.68711.m00017;Parent=model.68711.m00017
ADD COMMENTlink modified 14 months ago • written 14 months ago by SMK1.9k

great thanks, I'll give it a go

ADD REPLYlink written 14 months ago by rob234king600

It worked perfectly, once I had the right perl version running for it.

ADD REPLYlink written 14 months ago by rob234king600

Weird that the phase of the second CDS have changed from 1 to 2 during the process. A bug or copy past problem?

ADD REPLYlink written 14 months ago by Juke344.5k
1
gravatar for Ace
14 months ago by
Ace70
Ace70 wrote:

It seems simplest to just use grep for this.

cat $gff | grep -v "five_prime_UTR" | grep -v "three_prime_UTR" > $new_gff

You could probably even just use

cat $gff | grep -v "_UTR"

and testing is as easy as

cat $gff | grep -v "_UTR" | cut -f 3 | grep -v "#" | sort -u

cat $gff | grep "_UTR" | cut -f 3 | grep -v "#" | sort -u

wherein the first diagnostic should contain know UTR categories and the last should be only UTR categories

ADD COMMENTlink modified 14 months ago • written 14 months ago by Ace70

But as rob234king said:

I could do just by deleting but then it would not update the start and stop positions of the gene and mRNA features, and correct exon features.

ADD REPLYlink written 14 months ago by SMK1.9k

Oh good catch, I must have glossed over that initially. I've written if statements into for loops to do this previously. Logic follows like this:

for gene in list

if strand=+

if UTR_start < gene start

new_Gene_start=UTR_Start

Repeat for end, set - strand conditions, etc etc

but it's a bit bulky and I've never tried to incorporate exons this way. Better to not reinvent the wheel if there's a functional perl script.

ADD REPLYlink modified 14 months ago • written 14 months ago by Ace70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1380 users visited in the last hour