Question

Best and reliable way to annotate genes? Here the genes belong to a assembled phage genome

0

Entering edit mode

6.5 years ago

DanielC ▴ 210

Dear Friends,

I am trying to annotate genes for a phage genome (Salmonella phagevB SPuM_SP116; Accession: KP010413.1) obtained from a blast hit of an assembled contig. Could you please tell me the best way to do it?

I tried DNAmaster but, for some reason it doesn't give me ORF/CDS (predicted by Glimmer/GeneMark - softwares integrated in DNAmaster) and only gives me tRNAs (predicted by Aragorn - software integrated in DNAmaster); may be I am doing something wrong. Whereas, when I run Glimmer locally, it gives me ORFs. I would really appreciate your input on what you think could be wrong here?

And, I am trying to download fasta file of the phage "Salmonella phagevB SPuM_SP116" from the https://phagesdb.org/phages/ website but I do not find this phage there. Can you please let me know if am missing something here?

Thank you very much! DK

phages DNAmaster Gene annotation • 3.1k views

ADD COMMENT • link updated 6.5 years ago by natasha.sernova ★ 4.0k • written 6.5 years ago by DanielC ▴ 210

0

Entering edit mode

I didn't find there Salmonella as a host, neither in the header nor going through rows here: https://phagesdb.org/allphages/

Check the name of the phage. Who submitted the sequence?

I was involved in sequencing of some phages, it's possible to find them in NCBI-nucleotide section.

ADD REPLY • link updated 6.5 years ago by Ram 45k • written 6.5 years ago by natasha.sernova ★ 4.0k

0

Entering edit mode

Hi Natasha,

From NCBI I get this page for the phage https://www.ncbi.nlm.nih.gov/nuccore/KP010413.1/

Submission information is as below:

ORGANISM  Salmonella phage vB_SPuM_SP116
            Viruses; dsDNA viruses, no RNA stage; Caudovirales; Myoviridae;
            Ounavirinae.
REFERENCE   1  (bases 1 to 87510)
  AUTHORS   Bao,H.
  TITLE     A new lytic Salmonella pullorum phage and its enzyme lys52
  JOURNAL   Unpublished
REFERENCE   2  (bases 1 to 87510)
  AUTHORS   Bao,H. and Shahin,K.
  TITLE     Direct Submission
  JOURNAL   Submitted (21-OCT-2014) Institute of Food Safety, Jiangsu Academy
            of Agricultural Science, No. 50 Zhongling Street, Nanjing, Jiangsu
            210014, China

but, I am clueless on the next steps of what information to use from this page (above), in the pahgesdb website to download this phage? Could you please provide your guidance?

Thank you, DK

ADD REPLY • link 6.5 years ago by DanielC ▴ 210

0

Entering edit mode

Just an edit to my above query:

I went to https://phagesdb.org/data/ and to:

Full List of GenBank Accession Numbers

With phage names
Just accession numbers

and downloaded the list "With phage names" and looked for the accession of my phage of interest "KP010413" but , could nto find it. Is this the right way to find of the phage is present in the phagesdb website?

Thanks, DK

ADD REPLY • link 6.5 years ago by DanielC ▴ 210

0

Entering edit mode

You could just use pipelines like prokka for this. It is a general purpose annotation pipeline, and salmonella is well characterised so you would likely get good annotations straight away.

Alternatively, you can tell prokka to use different translation tables, and instruct it to use a viral table (though for bacteriophages the bacterial table is probably sufficient anyway).

ADD REPLY • link 6.5 years ago by Joe 22k

0

Entering edit mode

You found this link below, right?

https://www.ncbi.nlm.nih.gov/nuccore/KP010413.1/

On this page you have Genebank-link

in the left hand corner and FASTA-link

below - press it - I have got that genome here

https://www.ncbi.nlm.nih.gov/nuccore/KP010413.1?report=fasta

Why do you insist on that phage-database? NCBI has been a reliable source.

ADD REPLY • link 6.5 years ago by natasha.sernova ★ 4.0k

0

Entering edit mode

Thanks Natasha! I already have the phage genome from NCBI; since I am using DNAmaster, it is said in its tutorial that one should use the phage genome from their phagesdb website only as it is a "finished/polished" sequence which is important for gene prediction and annotation. But since I don't find the phage in pahgesdb, after downloading from NCBI I performed the DNAmaster steps but when predicting the genes/ORFs I only see tRNAs after running "auto-annotate" but, when I run "Glimmer" locally I find predicted ORFs. So, I am trying to figure out what could be wrong with DNAmaster step? Any sugegstions? I would really appreciate.

ADD REPLY • link 6.5 years ago by DanielC ▴ 210

0

Entering edit mode

See this link again. https://www.ncbi.nlm.nih.gov/nuccore/KP010413.1/

And go down - the authors predicted a lot of reading frames. What else do you need?

ADD REPLY • link 6.5 years ago by natasha.sernova ★ 4.0k

0

Entering edit mode

Thanks Natasha! I totally agree! But, prediction of genes by softwares like Glimmer, DNAmaster etc might predict some new genes - I think that could be one reason for researchers to use such softwares; do you agree? Moreover, DNAmaster is so widely used for prediction and annotation; however, the software is not stable and crashes often.

ADD REPLY • link 6.5 years ago by DanielC ▴ 210

0

Entering edit mode

DK : First align your assembled genome to the reference that @Natasha has linked below. Make sure there is good concordance between your sequence and reference. At this point you could use the reference genome annotation from NCBI's version to map on to your assembly (assuming there are good stretches of homology, ideally there may just be some SNP's).

ADD REPLY • link 6.5 years ago by GenoMax 152k

0

Entering edit mode

I went to Google and typed ‘bacteriophage genome annotation’.

There are a lot of links there, some of them look promising.

Like this one: https://phagesdb.org/media/workflow/protocols/pdfs/Guiding_Principles_of_Bacteriophage_Genome_Annotation_6.2013_PDF.pdf - your favorite db, isn't it?

or this one: http://grantome.com/grant/NSF/DBI-0850356

Try to find some other database, otherwise you have to annotate it by yourself as @genomax has suggested...

There should be a lot of them – phages are the most abundant viruses on the Earth.

They are harmless, their genomes are relatively small. I mean, there may be some other databases

with the phage genome. Or you can send a letter to the authors – they did it about 4 years ago. Good luck!

ADD REPLY • link 6.5 years ago by natasha.sernova ★ 4.0k

0

Entering edit mode

Thanks! I will work on these ideas and let you know. I think using DNAmaster for the genomes not present in phagesdb website is not a good idea as one needs to polish the sequence for analysis. I am thinking of using Glimmer, GeneMark and then annotating these genes using using BLast and/or use the already annotated genes from NCBI as @genomax said.

ADD REPLY • link 6.5 years ago by DanielC ▴ 210

0

Entering edit mode

Look at genomax comment in the end of this post. A: Shall I take vigna angularis or vigna radiata as a reference for vigna munga

It worked for bacteria, who knows it may be helpful to your phage genome annotation as well with known template?

ADD REPLY • link 6.5 years ago by natasha.sernova ★ 4.0k