Question: De novo assembly for virus genome with Velvet.
0
gravatar for Peter
4.6 years ago by
Peter0
China
Peter0 wrote:

Dear expert,

I want to assembly a virus genome using velvet. The virus genome is about 3KB length. In the follow command, ref1.fa is the reference genome. But it seems that I can not success. I cannot get back ref1.fa from the simulate reads.

~/bin/bioinfomatics/wgsim-master/wgsim -N 500000 -1 100 -2 100 -h ~/projects/virus/analysis/ref1.fa r1.fq r2.fq
~/bin/bioinfomatics/velvet_1.2.10/contrib/shuffleSequences_fasta/shuffleSequences_fasta.pl r1.fq r2.fq output.fq

~/bin/bioinfomatics/velvet_1.2.10/contrib/VelvetOptimiser-2.2.4/VelvetOptimiser.pl\
    -s 27 -e 31 -f '-longPaired -fastq output.fq' -t 4 --optFuncKmer 'n50'

 

 

Dec  1 17:18:33
Will run velvet optimiser with the following paramters:
    Velveth parameter string:
        -shortPaired -fastq output.fq
    Velveth start hash values:    27
    Velveth end hash value:        31
    Velveth hash step value:    2
    Velvetg minimum coverage cutoff to use:    0

    Read tracking for final assembly off.
Dec  1 17:18:33

    Beginning velveth runs.
********************************************************
Assembly id: 1
Velveth timestamp: Dec  1 2014 17:18:57
Velveth version: 1.2.10
Readfile(s): -shortPaired -fastq output.fq
Velveth parameter string: auto_data_27 27 -shortPaired -fastq output.fq
Assembly directory: /Users/jhuang/projects/hbv/analysis/auto_data_27
Velvet hash value: 27
Roadmap file size: 110519999
**********************************************************
********************************************************
Assembly id: 2
Velveth timestamp: Dec  1 2014 17:18:59
Velveth version: 1.2.10
Readfile(s): -shortPaired -fastq output.fq
Velveth parameter string: auto_data_29 29 -shortPaired -fastq output.fq
Assembly directory: /Users/jhuang/projects/hbv/analysis/auto_data_29
Velvet hash value: 29
Roadmap file size: 107811920
**********************************************************
********************************************************
Assembly id: 3
Velveth timestamp: Dec  1 2014 17:19:00
Velveth version: 1.2.10
Readfile(s): -shortPaired -fastq output.fq
Velveth parameter string: auto_data_31 31 -shortPaired -fastq output.fq
Assembly directory: /Users/jhuang/projects/hbv/analysis/auto_data_31
Velvet hash value: 31
Roadmap file size: 103399704
**********************************************************
Dec  1 17:19:00

    Beginning vanilla velvetg runs.
********************************************************
Assembly id: 1
Assembly score: 53
Velveth timestamp: Dec  1 2014 17:18:57
Velvetg timestamp: Dec  1 2014 17:21:52
Velveth version: 1.2.10
Velvetg version: 1.2.10
Readfile(s): -shortPaired -fastq output.fq
Velveth parameter string: auto_data_27 27 -shortPaired -fastq output.fq
Velvetg parameter string: auto_data_27  -clean yes
Assembly directory: /Users/jhuang/projects/hbv/analysis/auto_data_27
Velvet hash value: 27
Roadmap file size: 110519999
Total number of contigs: 1062
n50: 53
length of longest contig: 95
Total bases in contigs: 58657
Number of contigs > 1k: 0
Total bases in contigs > 1k: 0
**********************************************************
********************************************************
Assembly id: 2
Assembly score: 57
Velveth timestamp: Dec  1 2014 17:18:59
Velvetg timestamp: Dec  1 2014 17:22:06
Velveth version: 1.2.10
Velvetg version: 1.2.10
Readfile(s): -shortPaired -fastq output.fq
Velveth parameter string: auto_data_29 29 -shortPaired -fastq output.fq
Velvetg parameter string: auto_data_29  -clean yes
Assembly directory: /Users/jhuang/projects/hbv/analysis/auto_data_29
Velvet hash value: 29
Roadmap file size: 107811920
Total number of contigs: 424
n50: 57
length of longest contig: 95
Total bases in contigs: 25334
Number of contigs > 1k: 0
Total bases in contigs > 1k: 0
**********************************************************
********************************************************
Assembly id: 3
Assembly score: 61
Velveth timestamp: Dec  1 2014 17:19:00
Velvetg timestamp: Dec  1 2014 17:22:07
Velveth version: 1.2.10
Velvetg version: 1.2.10
Readfile(s): -shortPaired -fastq output.fq
Velveth parameter string: auto_data_31 31 -shortPaired -fastq output.fq
Velvetg parameter string: auto_data_31  -clean yes
Assembly directory: /Users/jhuang/projects/hbv/analysis/auto_data_31
Velvet hash value: 31
Roadmap file size: 103399704
Total number of contigs: 1917
n50: 61
length of longest contig: 99
Total bases in contigs: 121773
Number of contigs > 1k: 0
Total bases in contigs > 1k: 0
**********************************************************
Dec  1 17:22:07 Best assembly by assembly score - assembly id: 3
Dec  1 17:22:07 Optimisation routine chosen for best assembly: shortPaired
Dec  1 17:22:07 Looking for the expected coverage
Dec  1 17:22:09        Expected coverage set to 0
********************************************************
Assembly id: 3
Assembly score: 61
Velveth timestamp: Dec  1 2014 17:19:00
Velvetg timestamp: Dec  1 2014 17:22:07
Velveth version: 1.2.10
Velvetg version: 1.2.10
Readfile(s): -shortPaired -fastq output.fq
Velveth parameter string: auto_data_31 31 -shortPaired -fastq output.fq
Velvetg parameter string: auto_data_31  -clean yes -exp_cov 0
Assembly directory: /Users/jhuang/projects/hbv/analysis/auto_data_31
Velvet hash value: 31
Roadmap file size: 103399704
Total number of contigs: 1917
n50: 61
length of longest contig: 99
Total bases in contigs: 121773
Number of contigs > 1k: 0
Total bases in contigs > 1k: 0
Paired Library insert stats:
**********************************************************
Dec  1 17:22:09 Setting the short insert length
Dec  1 17:22:09 Setting assembly short insert length(s) to auto
Dec  1 17:22:09 Beginning coverage cutoff optimisation
Minimum specified coverage cutoff is higher than the expected coverage. Please choose a minimum coverage cutoff smaller than 0 and re-run.

velvet assembly • 3.1k views
ADD COMMENTlink modified 4.6 years ago by rtliu2.0k • written 4.6 years ago by Peter0

Here is my reference.

>2547-16_ASC_B
CTCCACCACTTTCCACCAAACTCTTCAAGATCCCAGAGTCAGGGCCCTGTACTTTCCTGCTGGTGGCTCCAGTTCAGGAACAGTGAGCCCTGCTCAGAATACTGTCTCTGCCATATCGTCAATCTTATCGAAGACTGGGGACCCTGTACCGAACATGGAGAACATCGCATCAGGACTCCTAGGACCCCTGCTCGTGTTACAGGCGGGGTTTTTCTTGTTGACAAAAATCCTCACAATACCACAGAGTCTAGACTCGTGGTGGACTTCTCTCAATTTTCTAGGGGGAACACCCGTGTGTCTTGGCCAAAATTCGCAGTCCCAAATCTCCAGTCACTCACCAACCTGTTGTCCTCCAATTTGTCCTGGTTATCGCTGGATGTGTCTGCGGCGTTTTATCATCTTCCTCTGCATCCTGCTGCTATGCCTCATCTTCTTGTTGGTTCTTCTGGACTATCAAGGTATGTTGCCCGTTTGTCCTCTAATTCCAGGATCATCAACAACCAGCACCGGACCATGCAAGACCTGCACAACTCCTGCTCAAGGAACCTCTATGTTTCCCTCATGTTGCTGTACAAAACCTACGGACGGAAACTGCACCTGTATTCCCATCCCATCATCTTGGGCTTTCGCAAAATACCTATGGGAGTGGGCCTCAGTCCGTTTCTCTTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGTAGGGCTTTCCCCCACTGTCTGGCTTTCAGTTATATGGATGATGTGGTTTTGGGGGCCAAGTCTGTACAACATCTTGAGTCCCTTTATGCCGCTGTTACCCATTTTCTTTTGTCTTTGGGTATACATTTAAACCCTCACAAAACAAAAAGATGGGGATATTCCCTTAACTTCATGGGATATGTAATTGGGAGTTGGGGCACATTGCCACAGGAACATATTGTACAAAAAATCAAAATGTGTTTTAGGAAACTTCCTGTAAACAGGCCTATTGATTGGAAAGTATGTCAACGAATTGTGGGTCTTTTGGGGTTTGCCGCACCTTTCACGCAATGTGGATATCCTGCTTTAATGCCTTTATATGCATGCATACAAGCAAAACAGGCTTTTACTTTCTCGCCAACTTACAAGGCCTTTCTAAGTCAACAGTATTTGAACCTTTACCCCGTTGCTCGGCAACGGCCTGGTCTGTGCCAAGTGTTTGCTGACGCAACCCCCACTGGTTGGGGCTTGGCCATAGGCCATCAGCGCATGCGTGGAACCTTTGTGTCTCCTCTGCCGATCCATACTGCGGAACTCCTAGCCGCTTGTTTTGCTCGCAGCAGGTCTGGGGCAAAACTCATCGGGACTGACAATTCTGTCGTGCTCTCCCGCAAGTATACATCATTTCCATGGCTGCTAGGCTGTGCTGCCAACTGGATCCTGCGCGGGACGTCCTTTGTTTACGTCCCGTCGGCGCTGAATCCCGCGGACGACCCCTCCCGGGGCCGCTTGGGGCTCTACCGCCCGCTTCTCCGCCTATTGTACCGACCGACCACGGGGCGCACCTCTCTTTACGCGGACTCCCCGTCTGTGCCTTCTCATCTGCCGGCCCGTGTGCACTTCGCTTCACCTCTGCACGTCGCATGGAGACCACCGTGAACGCCCACAGGAACCTGCCCAAGGTCTTGCATAAGAGGACTCTTGGACTTTCAGCAATGTCAACGACCGACCTTGAGGCATACTTCAAAGACTGTGTGTTTAATGAGTGGGAGGAGTTGGGGGAGGAGGTGAGGTTAAAGGTCTTTGTACTAGGAGGCTGTAGGCATAAATTGGTGTGTTCACCAGCACCATGCAACTTTTTCACCTCTGCCTAATCATCTCATGTTCATGTCCTACTGTTCAAGCCTCCAAGCTGTGCCTTGGGTGGCTTTGGGGCATGGACATTGACCCGTATAAAGAATTTGGAGCTTCTGTGGAGTTACTCTCTTTTTTGCCTTCTGACTTCTTTCCTTCTATTCGAGATCTCCTCGACACCGCCTCTGCTCTGTATCGGGAGGCCTTAGAGTCTCCGGAACATTGTTCACCTCACCATACGGCACTCAGGCAAGCTATTCTGTGTTGGGGTGAGTTGATGAATCTAGCAACCTGGGTGGGAAGTAATTTGGAAGATCCAGCATCCAGGGAATTAGTAGTCAGCTATGTCAACGTTAACATGGGCCTAAAAATCAGACAACTATTGTGGTTTCATATTTCCTGTCTTACTTTTGGGAGAGAAACTGTTCTTGAATATTTGGTGTCTTTTGGAGTGTGGATTCGCACTCCTCCTGCATATAGACCACCAAATGCCCCTATCTTATCAACACTTCCGGAAACTACTGTTGTTAGACGAAGAGGCAGGTCCCCTAGAAGAAGAACTCCCTCGCCTCGCAGACGAAGGTCTCAATCGCCGCGTCGCAGAAGATCTCAATCTCGGGAATCTCAATGTTAGTATTCCTTGGACACATAAGGTGGGAAACTTTACGGGGCTTTATTCTTCTACGGTACCCTGCTTTAATCCTAAATGGCAAACTCCTTCTTTTCCCGACATTCATTTGCAGGAGGACATTGTTGATAGATGTAAGCAATTTGTGGGGCCCCTTACAGTAAATGAAAACAGGAGACTAAAATTAATTATGCCTGCTAGGTTTTATCCCAATGTTACTAAATATTTGCCCTTAGATAAAGGGATCAAACCGTATTATCCAGAGTATGTAGTTAATCATTACTTCCAGACGCGACATTATTTACACACTCTTTGGAAGGCGGGGATCTTATATAAAAGAGAGTCCACACGTAGCGCCTCATTTTGCGGGTCACCATATTCTTGGGAACAAGATCTACAGCATGGGAGGTTGGTCTTCCAAACCTCGAAAAGGCATGGGGACAAATCTTTCTGTCCCCAATCCCCTGGGATTCTTCCCCGATCATCAGTTGGACCCTGCATTCAAAGCCAACTCAGAAAATCCAGATTGGGACCTCAACCCGCACAAGGACACCTGGCCGGACGCCAACAAGGTGGGAGTGGGAGCATTCGGGCCAGGGTTCACCCCTCCCCATGGGGGACTGTTGGGGTGGAGCCCTCAGGCTCAGGGCCTACTCGCAACTGTGCCAGCAGCTCCTCCTCCTGCCTCCACCAATCGGCAGTCAGGAAGGCAGCCTACTCCCTTATCTCCACCTCTAAGGGACACTCATCCTCAGGCCATGCAGTGGAA

ADD REPLYlink written 4.6 years ago by Peter0
0
gravatar for Daniel
4.6 years ago by
Daniel3.7k
Cardiff University
Daniel3.7k wrote:

Something appears to be going wrong before that error as none of your contigs are larger than 99 bases. A few points:

  • Your input data type is 100bp paired end right? I wouldn't describe that as 'long paired end'  in your command. But it seems that velvetoptimiser is running shortpaired anyway, so that may or may not be an issue.
  • what does the quality look like on the data? I don't know wgsim, but could the artificial error profiles be screwing it up?
ADD COMMENTlink written 4.6 years ago by Daniel3.7k

My input data is 100bp paired end.

I simulate the data with wgsim. The data quality is fine on the data.

ADD REPLYlink written 4.6 years ago by Peter0
0
gravatar for rtliu
4.6 years ago by
rtliu2.0k
New Zealand
rtliu2.0k wrote:

The coverage is too deep for velvet to handle, try to reduce the coverge to 50x - 100x. e.g. wgsim -N 1500.

 

With your current simulated data, use velvet-estimate-exp_cov.pl to evaluate the coverage-cutoff (say 300), add -exp_cov auto -cov_cutoff 300 parameters to velvetg

 

 

 

 

ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by rtliu2.0k

Why here is not the deeper the better?

The reference genome is about 3000bp. Which is better for 2*300bp or 2*100bp or 1*300bp?

ADD REPLYlink written 4.6 years ago by Peter0

Try normalization the reads before assembly. http://ged.msu.edu/papers/2012-diginorm/

ADD REPLYlink written 4.4 years ago by Yang Li70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1878 users visited in the last hour