How to make hybrid assembly of a viral genome
1
0
Entering edit mode
5 months ago
Kumar ▴ 130

Hi,

I am looking to make Hybrid assembly of a viral genome. I have got paired-end reads from Illumina and long reads from MinION from a viral sample. I tried Unicycler for making hybrid assembly but it is for bacterial genome assembly. Could anyone please suggest a pipeline for viral genome assembly. Also, please let me know a program for quality check and trimming for MinION data. I am trying LongQC, MinIONQC but I'm not sure if these are appropriate.

Thank you!

Virus Illumina Hybrid Genome assembly MinION • 610 views
0
Entering edit mode
5 months ago
Buffo ★ 1.9k

Try SPAdes, it is a very popular assembler (hybrid assemblies included) and there are new releases for viruses:

0
Entering edit mode

I have DNA virus. Therefore, which SPAdes I should use for making hybrid assembly. I checked SPAdes manual but I did not find a command to run SPAdes for Illumina and MinION data.

0
Entering edit mode

The current version of SPAdes works with Illumina or IonTorrent reads and is capable of providing hybrid assemblies using PacBio, Oxford Nanopore and Sanger reads. You can also provide additional contigs that will be used as long reads.


Oxford Nanopore = MinION

0
Entering edit mode

I ran SPAdes with the following command. However, in the output file (scaffold.fasta and contigs.fasta), it is showing several nodes. However, the purpose of making assembly is that I am looking to make an assembly of the whole viral genome, it should come in a single fasta file.

command: spades.py -k 21,33,55,77 --careful -1 file_R1_001.fastq.gz -2 file_R2_001.fastq.gz --nanopore merge.fastq.gz -o out_spades

0
Entering edit mode

What do you exactly mean by:

it is showing several nodes

0
Entering edit mode

Several nodes means multiple scaffolds or contigs fasta (>) in one file but for whole genome assembly, it should come in one complete fasta (>) genome in one file. Usually, when I do Unicycler it generates one complete one fasta sequence in a file. It removes all the gaps and small scaffolds or contigs and generates one complete seq. When we assemble a viral genome it is one complete sequence. See (NC_001802.1) for example. It is a single sequence genome. However, SPAdes assembles the reads in >NODE1, >NODE2 (multiple scaffolds) in a file.

0
Entering edit mode

That's the problem:

it should come in one complete fasta

Assembly performance depends on many variables, such as; coverage, sequence quality, genome complexity, etc, etc. So, if your result is fragmented, the problem would be caused for some of those variables. You should start analyzing your input data.

0
Entering edit mode

Yes, but when I use Unicycler it gives in one fragment. Therefore, I am concerned about the assembler program. I am not using Unicycler because it mentioned that it particular for bacterial assembly.

0
Entering edit mode

I strongly suggest you read about Benchmarking of next and third generation sequencing technologies and their associated algorithms for de novo genome assembly. Genome assemblers do not have the same performance among species, sources of data, data quality, and so on.