virus genome annotation
Entering edit mode
21 days ago
G.S ▴ 10


I need help to edit the NCBI GFF3 file so I can annote my consensus sequence proteins. I have Ns stretches (total 12 Ns) at the beginning and end of my sequence compared o NCBI sequnce. Please any idea how can I modify the gFF3 files see the attached picture. Then I want to annotate the coding and non coding regions. I have written this code but these values is not correct based on my sequence?? Any idea how can I edit this value to match my sequence?

coding = case_when(Position >= 45     & Position <= 98 ~ "Non-coding",
                   Position >= 596  & Position <= 627 ~ "Non-coding",
                   Position >= 1126 & Position <= 1140 ~ "Non-coding",
                   Position >= 2230 & Position <= 2346 ~ "Non-coding",
                   Position >= 3253 & Position <= 3261 ~ "Non-coding",
                   Position >= 4220 & Position <= 4303 ~ "Non-coding",
                   Position >= 4674 & Position <= 4687 ~ "Non-coding",
                   Position >= 5649 & Position <= 5661 ~ "Non-coding",
                   Position >= 7598 & Position <= 7606 ~ "Non-coding",
                   Position >= 99     & Position <= 504 ~ "Coding",
                   Position >= 507  & Position <= 988 ~ "Coding",
                   Position >= 991  & Position <= 2302 ~ "Coding",
                   Position >= 2305 & Position <= 3058 ~ "Coding",
                   Position >= 3061 & Position <= 4018 ~ "Coding",
                   Position >= 4021 & Position <= 4484 ~ "Coding",
                   Position >= 4487 & Position <= 5571 ~ "Coding",
                   Position >= 5574 & Position <= 7372 ~ "Coding",
                   Position >= 7375 & Position <= 8171 ~ "Coding",
                   Position >= 8180 & Position <= 8418 ~ "Coding",
                   Position >= 8421 & Position <= 14982 ~ "Coding"

Here is the link for NCBI reference sequence

##sequence-region KT992094.1 1 15223
KT992094.1  Genbank region  1   15223   .   +   .   ID=KT992094.1:1..15223;Dbxref=taxon:11250;gb-acronym=HRSV;gbkey=Src;genome=genomic;mol_type=viral cRNA;note=recombinant D46/D53 strain;strain=A2
KT992094.1  Genbank gene    45  576 .   +   .   ID=gene-NS1;Name=NS1;gbkey=Gene;gene=NS1;gene_biotype=protein_coding
KT992094.1  Genbank CDS 99  518 .   +   0   ID=cds-ALS35583.1;Parent=gene-NS1;Dbxref=NCBI_GP:ALS35583.1;Name=ALS35583.1;gbkey=CDS;gene=NS1;product=nonstructural protein 1;protein_id=ALS35583.1
KT992094.1  Genbank gene    596 1098    .   +   .   ID=gene-NS2;Name=NS2;gbkey=Gene;gene=NS2;gene_biotype=protein_coding
KT992094.1  Genbank CDS 628 1002    .   +   0   ID=cds-ALS35584.1;Parent=gene-NS2;Dbxref=NCBI_GP:ALS35584.1;Name=ALS35584.1;gbkey=CDS;gene=NS2;product=nonstructural protein 2;protein_id=ALS35584.1
KT992094.1  Genbank gene    1126    2328    .   +   .   ID=gene-N;Name=N;gbkey=Gene;gene=N;gene_biotype=protein_coding
KT992094.1  Genbank CDS 1141    2316    .   +   0   ID=cds-ALS35585.1;Parent=gene-N;Dbxref=NCBI_GP:ALS35585.1;Name=ALS35585.1;gbkey=CDS;gene=N;product=nucleoprotein;protein_id=ALS35585.1
KT992094.1  Genbank gene    2330    3243    .   +   .   ID=gene-P;Name=P;gbkey=Gene;gene=P;gene_biotype=protein_coding
KT992094.1  Genbank CDS 2347    3072    .   +   0   ID=cds-ALS35586.1;Parent=gene-P;Dbxref=NCBI_GP:ALS35586.1;Name=ALS35586.1;gbkey=CDS;gene=P;product=phosphoprotein;protein_id=ALS35586.1
KT992094.1  Genbank gene    3253    4210    .   +   .   ID=gene-M;Name=M;gbkey=Gene;gene=M;gene_biotype=protein_coding
KT992094.1  Genbank CDS 3262    4032    .   +   0   ID=cds-ALS35587.1;Parent=gene-M;Dbxref=NCBI_GP:ALS35587.1;Name=ALS35587.1;gbkey=CDS;gene=M;product=matrix protein;protein_id=ALS35587.1
KT992094.1  Genbank gene    4220    4629    .   +   .   ID=gene-SH;Name=SH;gbkey=Gene;gene=SH;gene_biotype=protein_coding
KT992094.1  Genbank CDS 4304    4498    .   +   0   ID=cds-ALS35582.1;Parent=gene-SH;Dbxref=NCBI_GP:ALS35582.1;Name=ALS35582.1;gbkey=CDS;gene=SH;product=small hydrophobic protein;protein_id=ALS35582.1
KT992094.1  Genbank gene    4674    5596    .   +   .   ID=gene-G;Name=G;gbkey=Gene;gene=G;gene_biotype=protein_coding
KT992094.1  Genbank CDS 4689    5585    .   +   0   ID=cds-ALS35588.1;Parent=gene-G;Dbxref=NCBI_GP:ALS35588.1;Name=ALS35588.1;gbkey=CDS;gene=G;product=attachment glycoprotein G;protein_id=ALS35588.1
KT992094.1  Genbank gene    5649    7551    .   +   .   ID=gene-F;Name=F;gbkey=Gene;gene=F;gene_biotype=protein_coding
KT992094.1  Genbank CDS 5662    7386    .   +   0   ID=cds-ALS35589.1;Parent=gene-F;Dbxref=NCBI_GP:ALS35589.1;Name=ALS35589.1;gbkey=CDS;gene=F;product=fusion protein;protein_id=ALS35589.1
KT992094.1  Genbank gene    7598    8558    .   +   .   ID=gene-M2;Name=M2;gbkey=Gene;gene=M2;gene_biotype=protein_coding
KT992094.1  Genbank CDS 7607    8191    .   +   0   ID=cds-ALS35591.1;Parent=gene-M2;Dbxref=NCBI_GP:ALS35591.1;Name=ALS35591.1;gbkey=CDS;gene=M2;product=m2-1;protein_id=ALS35591.1
KT992094.1  Genbank CDS 8160    8432    .   +   0   ID=cds-ALS35592.1;Parent=gene-M2;Dbxref=NCBI_GP:ALS35592.1;Name=ALS35592.1;gbkey=CDS;gene=M2;product=m2-2 protein;protein_id=ALS35592.1
KT992094.1  Genbank gene    8491    15068   .   +   .   ID=gene-L;Name=L;gbkey=Gene;gene=L;gene_biotype=protein_coding
KT992094.1  Genbank CDS 8499    14996   .   +   0   ID=cds-ALS35590.1;Parent=gene-L;Dbxref=NCBI_GP:ALS35590.1;Name=ALS35590.1;gbkey=CDS;gene=L;product=L polymerase protein;protein_id=ALS35590.1

Thanks in advance,

enter image description here

enter image description here

annotation gff3 consensus NCBI coding_regions • 183 views

Login before adding your answer.

Traffic: 2328 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6