Question: start codon gff annotation corrections using fasta file
0
gravatar for rob234king
3.1 years ago by
rob234king570
UK/Harpenden/Rothamsted Research
rob234king570 wrote:

I have an annotation produced using maker with RNA-seq evidence but many gene models start with TTG or CTG rather than ATG ~2000 but almost all just have wrong start codon. Most in first exon have the ATG start codon a few bases downstream.

What I want to do is using just the gff3 of these sequences is to take the start CDS annotation line (have to account for if + or -) and search the genome fasta file for that window and find the next ATG and correct the start position of that first CDS feature and then the end of the 5' UTR.

For example using below test.gff I would take position 447 and 554 as first exon for + strand annotation and then search a fasta file. Does anyone know of a scripting way of doing this or already existing software to correct start codons?

test.gff

  chrom_1_extraction    maker   three_prime_UTR 2254    2320    .   +   .   ID=maker-chrom_1-augustus-gene-0.156-mRNA-1:three_prime_utr;Parent=maker-chrom_1-augustus-gene-0.156-mRNA-1
    chrom_1_extraction  maker   five_prime_UTR  295 446 .   +   .   ID=maker-chrom_1-augustus-gene-0.156-mRNA-1:five_prime_utr;Parent=maker-chrom_1-augustus-gene-0.156-mRNA-1 
    chrom_1_extraction  maker   transcript  295 2320    .   +   .   Name=maker-chrom_1-augustus-gene-0.156-mRNA-1;ID=maker-chrom_1-augustus-gene-0.156-mRNA-1;_AED=0.00;_eAED=0.00;_QI=152|0.8|0.83|1|0.8|0.66|6|67|516;Parent=maker-chrom_1-augustus-gene-0.156 
    chrom_1_extraction  maker   gene    295 2320    .   +   .   Name=maker-chrom_1-augustus-gene-0.156;ID=maker-chrom_1-augustus-gene-0.156 
chrom_1_extraction  maker   CDS 447 554 .   +   .   ID=maker-chrom_1-augustus-gene-0.156-mRNA-1:cds;Parent=maker-chrom_1-augustus-gene-0.156-mRNA-1 
    chrom_1_extraction  maker   CDS 616 1002    .   +   .   ID=maker-chrom_1-augustus-gene-0.156-mRNA-1:cds;Parent=maker-chrom_1-augustus-gene-0.156-mRNA-1 
chrom_1_extraction  maker   CDS 1050    1755    .   +   .   ID=maker-chrom_1-augustus-gene-0.156-mRNA-1:cds;Parent=maker-chrom_1-augustus-gene-0.156-mRNA-1 
    chrom_1_extraction  maker   CDS 1803    1903    .   +   .   ID=maker-chrom_1-augustus-gene-0.156-mRNA-1:cds;Parent=maker-chrom_1-augustus-gene-0.156-mRNA-1 
chrom_1_extraction  maker   CDS 1955    2054    .   +   .   ID=maker-chrom_1-augustus-gene-0.156-mRNA-1:cds;Parent=maker-chrom_1-augustus-gene-0.156-mRNA-1 
    chrom_1_extraction  maker   CDS 2105    2253    .   +   .   ID=maker-chrom_1-augustus-gene-0.156-mRNA-1:cds;Parent=maker-chrom_1-augustus-gene-0.156-mRNA-1

Fasta file

>chrom_1_extraction
CAACATTGATATCATCAGCAACCTAAGTAGCGGTGAACATGAGACGTACAGTGCGATACGTAGTTGACTGCTTAAACAAGATTGGCTTTTGTTGCAGGGAAGCCTTGCTTCATGATGCTTTTCTGTTAATAGATAATTCTAGAACAGTGTCTTCTAAAGCTCAGCTACCCTATGGCTATGACTTGTTGGATTATAGCCAATCACACAAGCCAAACTACCTAGTCTAGACTAGCGGAGAGGTTTTAGCGTACGTATCCTTGGCTTCCCCGCTATTGCCTTGTTTGCCTGTGTTATCTACCTCACATTTACGCCTGCATGTTACAACATCAGAACTACAGTCGCTTGGCATCTTGCACTTATGAAGCCAGTGAAATGCTGTACCACTTGTCGCCGCCGACACAGGAGATGCGTCACTCAGCCGGGAGCCTCTCAATGTAGCACTTGCCTTGAGTCAGGGCAGGAATGCCAATTCGACAATGACATTCGGTTCAAGCATAGTCATTCAAAAACTGAGAAGCAGTCAAGGAGAGAATGGGCTAAAGTTCCCTCTAAGAGTAAGTCCAAAGCCTGCTTGGCCTCTGGCCTTCAACCTTTCTGTGTATTTTCATGCTGAAGGCTGTAGTCTCTTTCACAGCACCACGCGGAATCGACGGTATGGTGTTGGAGGAATCAGGTAGTAACTCTACAAAGGACGAAGCTGCCAAGAACCTGTCGCAAACTGCGCAGATACCCGAAGAAATGATAGAAGTATCTGTTAATGATCTTGCACAATCTGGACCTCAAGTTGTGCCCTTGAACCCGGAACTCGACTACAGGAAATCAGCTTCCAACTTCATCGCCAACTCTTTAGTTGATGACCCTTCAGTCCGTGACTCTGATGAATATTTCGACCATGCTACAGCCCAAGACCTACCGGTTCAAGTTCATATTTCATCTCCATATGAGTTGACTGAACGAGAAGCCTTTCTTTTCATGATCTATATTTACAAATGTGCACCCTTGGTAAGTTACAGCTGTCAGATGTCGTCTCCACTAACATGACATTTAAGTCTGATGCATGTGACGATGCCCGTCATTTCGAACTCGAAGTTCCCCGATTGGCCCTTCGCCAACCCATGATAATGAACGGTCTACTCGCCCTCGCAAGCCGCTACGATTCTCGATGCATGGACACGTCCAACGACATTGAAAGCACATTTTACCACAATAAATGCATAAAGCTTCTTATAGAAGCTTTTGCTCAACCCCCTGAAACATGGGACTCAACGCTCCTTACAGCCGTTGTAATCGCGCGACTGTATGAGGAGAACGATAACGAGACTGATTCCTATTACCATCATCTCAGTGGAACGCAGAACCTTCTGAATCATGAGGCAGTCGCTAGGTTTGTGATACAGGGGGGATTAGCTGAAGCTGCAAGTTGGGTTCATCTTCGACAAGTAATCTACATCTACGTAGTGCGCAGGAGGCCTATCGAGATATGCCTTGAGAGCTTTGAGAGGTCAACTGTGTTTAGAAGATGTGACGATTCAGCATATGCGAACAGAGCGGTCTATAACTTCGCCAAGATTATGAGGCTATTTCTACAAGTTGAAAATTTGGACAGTGATCAAGACGAGTGGCAGGCAGCTGAGATGGAGGTAGACCGGTGGTATGACGCTAAGCCCGTATCTTTTCAACCTGTATTTCATATTTTGGCGGACCTCTCGGCAAACAGACCGTTCCCGACCCTTTACTTCATTGCATCAGTGCCCGGTAAGTGTGACTTTCAGCTGCTGTGCTCTCTCGCTAACATATCGGAGTCGTTGCAATGCAGTATTACTTCGCAGCCAAGGCCGTTTTATATTTGCATCATTGTAAGAACTTGCAGCAACTGAATAACCATGGAAGGCCAGACTTTGAAGTATTTTGAAGGACTACTCTCGTCAAAGATAACACACTAATTAATTGCCAGACCAAGATATCCTTCTATCTCTTCACTCTCATGGGTCTTGCTCTATCCAACTCCCATGTTCTAAACGCATTTTACCTACCTGCACATATGCTTTCATTCTGTACAGTCATCCCCCACCCCTAATGTGACAATGGCTGCTAACCCTTGTAGGTGGATATTGCATAAGAGACCCATGTGAACAGACCCATGCCATTTGTTACCTTGAGAAGGTTAACGAAGTGATTAAGTGGAAGACAAAGGAACTTATTGCAACGCTGAAGGAAAAATGGCATGATGGAGAGAAACATGATTCTCACTAATGGGCCCTCTCTGTTATATAAAAATAGTTCATCAATAAACTGCAAAGGTAGAATTATAAATGGCGCAGAATGGATATCCTGTAAGTGAAACTTTATGATGGAGTTTTGTAATTAATGAGACTTGTGGCCTTGAAGAAATGTCTTTTCTTTTTACTGTCGAATTTTAGTAATACTATAGCTAGGACCATCATTTTTATTCACTAAGAAAGATAACTCGCTAACACATAAGAAAAGGCCAATTATTTTAATTTATCCCTATC
annotation gff3 • 1.4k views
ADD COMMENTlink written 3.1 years ago by rob234king570

Hi, did you solve the problem? would you mind to share the solution, pls? Thanks....

ADD REPLYlink written 2.5 years ago by User000260
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 822 users visited in the last hour