Question: remove UTR annotations from gff
0
gravatar for rob234king
3 months ago by
rob234king580
UK/Harpenden/Rothamsted Research
rob234king580 wrote:

I have UTR annotations which I need to remove for a set of genes, which I could do just by deleting but then it would not update the start and stop positions of the gene and mRNA features, and correct exon features. Is there a tool to remove UTR features from a gff file and make the necessary corrections to the remaining mRNA and gene, and exon features? I can't seem to find something but there must be??

gff • 206 views
ADD COMMENTlink modified 3 months ago by Ace60 • written 3 months ago by rob234king580
2
gravatar for SMK
3 months ago by
SMK1.8k
SMK1.8k wrote:

Hi rob234king,

I think you can try gff3_file_UTR_trimmer.pl from PASApipeline:

(edited: changed the example to the gff3 in PASA directory; still thinking the meaning of a shift of phase, not a copy-paste problem)

$ cat test.gff3
gi|68711    TIGR    gene    12923   14228   .   +   .   ID=68711.t00017;Name=my_protein_name
gi|68711    TIGR    mRNA    12923   14228   .   +   .   ID=model.68711.m00017;Parent=68711.t00017
gi|68711    TIGR    five_prime_utr  12923   13029   .   +   .   ID=utr5p_of_68711.m00017;Parent=68711.m00017
gi|68711    TIGR    exon    12923   13060   .   +   .   ID=68711.e00069;Parent=model.68711.m00017
gi|68711    TIGR    CDS 13030   13060   .   +   0   ID=13030_13060cds_of_68711.m00017;Parent=model.68711.m00017
gi|68711    TIGR    exon    13411   13550   .   +   .   ID=68711.e00070;Parent=model.68711.m00017
gi|68711    TIGR    CDS 13411   13550   .   +   1   ID=13411_13550cds_of_68711.m00017;Parent=model.68711.m00017
gi|68711    TIGR    exon    13677   13802   .   +   .   ID=68711.e00071;Parent=model.68711.m00017
gi|68711    TIGR    CDS 13677   13802   .   +   0   ID=13677_13802cds_of_68711.m00017;Parent=model.68711.m00017
gi|68711    TIGR    exon    13876   14228   .   +   .   ID=68711.e00072;Parent=model.68711.m00017
gi|68711    TIGR    CDS 13876   14016   .   +   0   ID=13876_14016cds_of_68711.m00017;Parent=model.68711.m00017
gi|68711    TIGR    three_prime_utr 14017   14228   .   +   .   ID=utr3p_of_68711.m00017;Parent=68711.m00017

$ perl ~/src/PASApipeline-v2.3.3/misc_utilities/gff3_file_UTR_trimmer.pl test.gff3
gi|68711    TIGR    gene    13030   14016   .   +   .   ID=68711.t00017.1;Name=my_protein_name
gi|68711    TIGR    mRNA    13030   14016   .   +   .   ID=model.68711.m00017;Parent=68711.t00017.1;Name=my_protein_name
gi|68711    TIGR    exon    13030   13060   .   +   .   ID=model.68711.m00017.exon1;Parent=model.68711.m00017
gi|68711    TIGR    CDS 13030   13060   .   +   0   ID=cds.model.68711.m00017;Parent=model.68711.m00017
gi|68711    TIGR    exon    13411   13550   .   +   .   ID=model.68711.m00017.exon2;Parent=model.68711.m00017
gi|68711    TIGR    CDS 13411   13550   .   +   2   ID=cds.model.68711.m00017;Parent=model.68711.m00017
gi|68711    TIGR    exon    13677   13802   .   +   .   ID=model.68711.m00017.exon3;Parent=model.68711.m00017
gi|68711    TIGR    CDS 13677   13802   .   +   0   ID=cds.model.68711.m00017;Parent=model.68711.m00017
gi|68711    TIGR    exon    13876   14016   .   +   .   ID=model.68711.m00017.exon4;Parent=model.68711.m00017
gi|68711    TIGR    CDS 13876   14016   .   +   0   ID=cds.model.68711.m00017;Parent=model.68711.m00017
ADD COMMENTlink modified 3 months ago • written 3 months ago by SMK1.8k

great thanks, I'll give it a go

ADD REPLYlink written 3 months ago by rob234king580

It worked perfectly, once I had the right perl version running for it.

ADD REPLYlink written 3 months ago by rob234king580

Weird that the phase of the second CDS have changed from 1 to 2 during the process. A bug or copy past problem?

ADD REPLYlink written 3 months ago by Juke-342.6k
1
gravatar for Ace
3 months ago by
Ace60
Ace60 wrote:

It seems simplest to just use grep for this.

cat $gff | grep -v "five_prime_UTR" | grep -v "three_prime_UTR" > $new_gff

You could probably even just use

cat $gff | grep -v "_UTR"

and testing is as easy as

cat $gff | grep -v "_UTR" | cut -f 3 | grep -v "#" | sort -u

cat $gff | grep "_UTR" | cut -f 3 | grep -v "#" | sort -u

wherein the first diagnostic should contain know UTR categories and the last should be only UTR categories

ADD COMMENTlink modified 3 months ago • written 3 months ago by Ace60

But as rob234king said:

I could do just by deleting but then it would not update the start and stop positions of the gene and mRNA features, and correct exon features.

ADD REPLYlink written 3 months ago by SMK1.8k

Oh good catch, I must have glossed over that initially. I've written if statements into for loops to do this previously. Logic follows like this:

for gene in list

if strand=+

if UTR_start < gene start

new_Gene_start=UTR_Start

Repeat for end, set - strand conditions, etc etc

but it's a bit bulky and I've never tried to incorporate exons this way. Better to not reinvent the wheel if there's a functional perl script.

ADD REPLYlink modified 3 months ago • written 3 months ago by Ace60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 552 users visited in the last hour