Question: Velvet - ins_length auto
gravatar for Kenny
20 months ago by
New York
Kenny10 wrote:

Hi all,

I have obtained two illumina MiSeq 2x75 paired-end read files, one forward and one reverse.

oenopla-reads1.fastq & oenopla-reads2.fastq

Then I performed genome assembly using Velvet. I found out kmer of 67 produces the best N50 and maximum length.

Since I do not know the insert length, I declared -ins_length as auto. Here's my command:

velveth Velvet_67 67 -shortPaired -fastq -separate oenopla-reads1.fastq oenopla-reads2.fastq
velvetg Velvet_67 -ins_length auto -exp_cov auto -cov_cutoff auto

I wanted to see the ins_length (max and min), however when I check the log file, I only saw this:

Median coverage depth = 15.632530
Final graph has 3045 nodes and n50 of 165, max 5386, total 165527, using 769537/15947236 reads

I need the range to run another assembly program called Metassembler, it requires the max insert length and min insert length. My question is: How can I find out the insert length from Velvet?

Your help is greatly appreciated.

next-gen assembly • 654 views
ADD COMMENTlink modified 20 months ago • written 20 months ago by Kenny10
gravatar for Brian Bushnell
20 months ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

You can map the reads to the assembly and get the insert size from that; you don't need to map all the reads. For example, with BBMap: in1=oenopla-reads1.fastq in2=oenopla-reads2.fastq ref=contigs.fasta reads=100k ihist=ihist.txt

...where ihist.txt will contain the insert size distribution.

ADD COMMENTlink written 20 months ago by Brian Bushnell16k


1) I ran a perl script from the velvet developer to find the insert size

perl Velvet_67 > Velvet_ins_size.txt


Observed **median insert length: 352**
Observed mode of insert length: 347
Observed sample standard deviation: 868.138129940986
Suggested velvetg parameters: -ins_length 352 -ins_length_sd 868.138129940986

2) I ran bbmap to actually map the short reads to the Velvet contigs to calculate the insert size. in1=oenopla-reads1.fastq in2=oenopla-reads2.fastq ref=Velvet_67/contigs.fa reads=-1 ihist=ihist.txt


Pairing data:   pct pairs num pairs pct bases   num bases
mated pairs:       9.1591%    730308   9.1591%    109546200
bad pairs:         5.9061%    470928   5.9061%     70639200
insert size avg:   862.75
insert 25th %:      91.00
**insert median:     193.00**
insert 75th %:     374.00
insert std dev:    3468.98
insert mode:       78

I got two different median insert size, 352 and 193. Which number should I pick? How to decide the max and min?

ADD REPLYlink modified 20 months ago • written 20 months ago by Kenny10

In this case the vast majority of your reads did not map as proper pairs. This probably indicates your assembly was fairly discontiguous. Since only 9% of the reads mapped as proper pairs, you cannot trust that the insert size data from BBMap accurately reflects the library as a whole since it only covers 9% of the reads (the rest are likely to be longer, when the assembly has low contiguity). I suggest using Velvet's estimate.

ADD REPLYlink written 20 months ago by Brian Bushnell16k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 598 users visited in the last hour