remove UTR annotations from gff
        2 
    
    
    
        
        
        
        
            
                
                
                    
                        
                    
                
                    
                        I have UTR annotations which I need to remove for a set of genes, which I could do just by deleting but then it would not update the start and stop positions of the gene and mRNA features, and correct exon features. Is there a tool to remove UTR features from a gff file and make the necessary corrections to the remaining mRNA and gene, and exon features?  I can't seem to find something but there must be??
                    
                 
                 
                
                
                    
                    
    
        
        
            gff
         
        
    
    
        • 3.6k views
    
 
                
                 
                
                
    
    • 
link 
    
    
    
    
    
    
        
    
        updated 6.4 years ago by
        
            Ace 
         
        
    
        ▴
    
    90
        •
    
        written 6.4 years ago by
        
            rob234king 
         
        
    
        ▴
    
    610
     
 
 
             
            
            
         
     
 
     
    
        
            
                
 
    
    
    
    
        
        
        
        
            
                
                
                    
                        
                    
                
                    
                        Hi rob234king ,
I think you can try gff3_file_UTR_trimmer.pl  from PASApipeline:
(edited: changed the example to the gff3 in PASA directory; still thinking the meaning of a shift of phase, not a copy-paste problem)
$ cat test.gff3
gi|68711    TIGR    gene    12923   14228   .   +   .   ID=68711.t00017;Name=my_protein_name
gi|68711    TIGR    mRNA    12923   14228   .   +   .   ID=model.68711.m00017;Parent=68711.t00017
gi|68711    TIGR    five_prime_utr  12923   13029   .   +   .   ID=utr5p_of_68711.m00017;Parent=68711.m00017
gi|68711    TIGR    exon    12923   13060   .   +   .   ID=68711.e00069;Parent=model.68711.m00017
gi|68711    TIGR    CDS 13030   13060   .   +   0   ID=13030_13060cds_of_68711.m00017;Parent=model.68711.m00017
gi|68711    TIGR    exon    13411   13550   .   +   .   ID=68711.e00070;Parent=model.68711.m00017
gi|68711    TIGR    CDS 13411   13550   .   +   1   ID=13411_13550cds_of_68711.m00017;Parent=model.68711.m00017
gi|68711    TIGR    exon    13677   13802   .   +   .   ID=68711.e00071;Parent=model.68711.m00017
gi|68711    TIGR    CDS 13677   13802   .   +   0   ID=13677_13802cds_of_68711.m00017;Parent=model.68711.m00017
gi|68711    TIGR    exon    13876   14228   .   +   .   ID=68711.e00072;Parent=model.68711.m00017
gi|68711    TIGR    CDS 13876   14016   .   +   0   ID=13876_14016cds_of_68711.m00017;Parent=model.68711.m00017
gi|68711    TIGR    three_prime_utr 14017   14228   .   +   .   ID=utr3p_of_68711.m00017;Parent=68711.m00017
$ perl ~/src/PASApipeline-v2.3.3/misc_utilities/gff3_file_UTR_trimmer.pl test.gff3
gi|68711    TIGR    gene    13030   14016   .   +   .   ID=68711.t00017.1;Name=my_protein_name
gi|68711    TIGR    mRNA    13030   14016   .   +   .   ID=model.68711.m00017;Parent=68711.t00017.1;Name=my_protein_name
gi|68711    TIGR    exon    13030   13060   .   +   .   ID=model.68711.m00017.exon1;Parent=model.68711.m00017
gi|68711    TIGR    CDS 13030   13060   .   +   0   ID=cds.model.68711.m00017;Parent=model.68711.m00017
gi|68711    TIGR    exon    13411   13550   .   +   .   ID=model.68711.m00017.exon2;Parent=model.68711.m00017
gi|68711    TIGR    CDS 13411   13550   .   +   2   ID=cds.model.68711.m00017;Parent=model.68711.m00017
gi|68711    TIGR    exon    13677   13802   .   +   .   ID=model.68711.m00017.exon3;Parent=model.68711.m00017
gi|68711    TIGR    CDS 13677   13802   .   +   0   ID=cds.model.68711.m00017;Parent=model.68711.m00017
gi|68711    TIGR    exon    13876   14016   .   +   .   ID=model.68711.m00017.exon4;Parent=model.68711.m00017
gi|68711    TIGR    CDS 13876   14016   .   +   0   ID=cds.model.68711.m00017;Parent=model.68711.m00017
 
                    
                 
                 
                
                
                 
                
                
    
    • 
link 
    
    
    
    
    
    
        
    
    
        6.4 years ago by
    
    
        AK
     
    
    
        ★
    
    2.2k
     
     
 
 
             
            
            
         
     
 
         
        
            
                
 
    
    
    
    
        
        
        
        
            
                
                
                    
                        
                    
                
                    
                        It seems simplest to just use grep for this.
  cat $gff | grep -v "five_prime_UTR" | grep -v "three_prime_UTR" > $new_gff
 
You could probably even just use 
  cat $gff | grep -v "_UTR"
 
and testing is as easy as 
  cat $gff | grep -v "_UTR" | cut -f 3 | grep -v "#" | sort -u 
  
  cat $gff | grep "_UTR" | cut -f 3 | grep -v "#" | sort -u
 
wherein the first diagnostic should contain know UTR categories and the last should be only UTR categories
                    
                 
                 
                
                
                 
                
                
    
    • 
link 
    
    
    
    
    
    
        
    
    
        6.4 years ago by
    
    
        Ace
     
    
    
        ▴
    
    90
     
     
 
 
             
            
            
         
     
 
         
        
 
    
    
        
            
                 Login  before adding your answer.
         
    
    
         
        
            
        
     
    
    Traffic: 5566 users visited in the last hour
         
    
    
        
    
    
 
great thanks, I'll give it a go
It worked perfectly, once I had the right perl version running for it.
Weird that the phase of the second CDS have changed from 1 to 2 during the process. A bug or copy past problem?