Question

What is the procedure for the assembly of bacteriophage genomes?

0

Entering edit mode

2.4 years ago

marongiu.luigi ▴ 710

Hello,

is the assembly of bacteriophages' genome different from that of humans? I understand that it is common to use NGS to sequence the genomes of newly isolated bacteriophages, but what is the correct procedure?

When isolating a new phage, there is no indication of its species of origin -- perhaps electron microscopy can give some indications on the order but that is all. Thus, assuming a homogeneous population of isolates sent for sequencing (with Illumina), would the best approach be de novo assembly? Step zero would be of course to remove poor quality reads. Are there any other tips?

also, are there public reads that can be downloaded to try and assembly the genome of one bacteriophage isolate?

Thank you

genome methods phage assembly WGS • 1.3k views

ADD COMMENT • link updated 2.4 years ago by shenwei356 8.4k • written 2.4 years ago by marongiu.luigi ▴ 710

score 2 · Accepted Answer · 2021-12-12

2

Entering edit mode

2.4 years ago

shenwei356 8.4k

Phage assembly is simple. Here are my procedures:

Spades for assembly.
Bandage for checking the de bruijn graph and extracting the circlular contig.
BLAST for making sure it's a phage.
Phageterm for determining the termini.
RAST/Bakta for annotation.

ADD COMMENT • link 2.4 years ago by shenwei356 8.4k

0

Entering edit mode

Thank you! only one thing: is there a repository of raw phage sequences so I can try the procedure?

ADD REPLY • link 2.4 years ago by marongiu.luigi ▴ 710

0

Entering edit mode

Try searching or just work on the real data. For our lab, we only submit the assembly genome of a phage isolate to Genbank.

ADD REPLY • link 2.4 years ago by shenwei356 8.4k

0

Entering edit mode

I got fastq file from the sra project ERX3462534, removed low quality reads with trimmomatic and ran spades. i got 1258 contigs, 397 of them having a coverage above 1000 (actually directly below 20). Do I need to remove the low coverage contigs? I then ran Bandage and got this graph (base on all the contigs):

enter image description here

I understand that Bandage allows editing the graph by manually moving the nodes. I should remove nodes that are not attached to the main 'molecule'. But how do I get a single fasta sequence from it? Even because PhageTerm performs (from what I understand) resequencing against a given reference sequence.

Essentially, Spades gives me a series of contigs; how do I get a single sequence out of it? Thanks

ADD REPLY • link 2.4 years ago by marongiu.luigi ▴ 710

0

Entering edit mode

I have blasted the longest contigs to try and pinpoint the closest genome. I got these:

enter image description here

I'd say that sp. 61, HK629, and lambda, being the most frequent, should be the most likely candidates. All the other parameters are the same, the figure shows the identities. Shall I then align the original fastq against each of these? Then choose the one with the fewer mismatches? Or is there a more straightforward way?

ADD REPLY • link 2.4 years ago by marongiu.luigi ▴ 710

1

Entering edit mode

I'd like to search against refseq/genbank with all contigs as a whole query to find the closest reference using mash/sourmash , which considers the genome distance rather than part of the genome (contig). There's no need to align reads to a reference.

It should be easy to extract a single circular contig. The graph you posted indicated the phage particles are not pure enough with some bacteria contamination, during the experiment.

ADD REPLY • link 2.4 years ago by shenwei356 8.4k