Question: Short-long read hybrid de novo assembly - determine k-mer size
0
gravatar for Kenny
2.1 years ago by
Kenny20
New York
Kenny20 wrote:

Dear all,

I am trying to do a short-long read hybrid de novo assembly on the MiSeq 2x75 paired end data and TruSeq long reads data (1500-16608bp) using three different assemblers, Velvet, ABySS, and SOAPdenovo.

I have already generated:

1) Velvet contigs using MiSeq only

2) ABySS contigs using MiSeq only

3) SOAP contigs using MiSeq only

And then, I combined the contigs with the long reads. For example,

cat Velvet_67/contigs.fa LongRead.fasta > Velvet_Hybrid.fasta

Now, I want to run assemblies using the hybrid fasta and three assemblers mentioned above.

My question is, how do I select the k-mer size?

For example, my Velvet_Hybrid.fasta has min read length of 133 and max read length of 16608.

Kenny

assembly • 856 views
ADD COMMENTlink modified 2.1 years ago by Kevin Blighe51k • written 2.1 years ago by Kenny20
3
gravatar for Kevin Blighe
2.1 years ago by
Kevin Blighe51k
Kevin Blighe51k wrote:

VelvetOptimiser should assist you in this regard. It allows you to check a range of k-mers and then uses a selected metric to pick the best one. It executes velvetg on each k-mer in your range.

You should be able to execute it with somethng like:

perl /home/k/kb295/programs/VelvetOptimiser-2.2.5/VelvetOptimiser.pl \
    --hashs 29 --hashe 79 --step 2 -a --optFuncKmer "n50" \
    --optFuncCov "Lbp" --threads 12 --dir_final Assembly/ \
    --velvethfiles "-shortPaired -separate -fastq.gz Sample1Mate1.fastq.gz Sample1Mate2.fastq.gz -shortPaired2 -separate -fastq.gz Sample2Mate1.fastq.gz Sample2Mate2.fastq.gz -shortPaired3 -separate -fastq.gz Sample3Mate1.fastq.gz Sample3Mate2.fastq.gz -shortPaired4 -separate -fastq.gz Sample4Mate1.fastq.gz Sample4Mate2.fastq.gz -strand_specific" \
    --velvetgoptions "-cov_cutoff auto -exp_cov auto -read_trkg yes"

Here, I run with 4 mate-pair samples and the optimisation function (that it uses to determine optimum k-mer) is the n50. IT checks each k-mer from 29 to 79, jumping by 2 each time.

ADD COMMENTlink modified 12 months ago • written 2.1 years ago by Kevin Blighe51k

Which read categories should I pick? since I only have a single file.

Read categories are:
short (default)
shortPaired
short2 (same as short, but for a separate insert-size library)
shortPaired2 (see above)
long (for Sanger, 454 or even reference sequences)
longPaired
ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by Kenny20

I would choose long for your data. Here's what the author and developer wrote:

The first distinction is made between long and short reads. There is no strict rule to decide what is long and short, but long read alignments are stored on more detailed data structures, which take up more memory, but allow the system to completely reconstruct their path through the assembly. Typically, reads which are longer than 200 bp (e.g. 454 or capillary) would be marked as long, but if memory is insufficient, they can also be considered short.

Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2952100/

ADD REPLYlink written 2.1 years ago by Kevin Blighe51k

Thank you Kevin! Appreciated :)

Kenny

ADD REPLYlink written 2.1 years ago by Kenny20

I am currently running velvet on my velvet_hybrid.fasta file. My command is:

for k in {93..131..2}; do velveth ./Velvet_VelvetContigs/k$k $k -long -fasta Velvet_Hybrid.fasta; velvetg ./Velvet_VelvetContigs/k$k -ins_length auto -exp_cov auto -cov_cutoff auto -read_trkg yes; done

However, I get some of the unusual messages, for example:

[0.000000] Reading FastA file Velvet_Hybrid.fasta; [16.050157] 199270 sequences found [16.050174] Done [16.097246] Reading read set file ./Velvet_VelvetContigs/k93/Sequences; [16.750961] 199270 sequences found [18.714973] Done [18.714992] 199270 sequences in total. [18.715532] Writing into roadmap file ./Velvet_VelvetContigs/k93/Roadmaps... [25.688339] Inputting sequences... [25.692383] Inputting sequence 0 / 199270 Killed [0.000001] Reading roadmap file ./Velvet_VelvetContigs/k93/Roadmaps [0.850929] 199270 roadmaps read

[0.000000] Reading roadmap file ./Velvet_VelvetContigs/k95/Roadmaps [0.951589] 199270 roadmaps read [0.953281] Creating insertion markers [1.024402] Ordering insertion markers [1.074139] Counting preNodes [1.155569] 1735089 preNodes counted, creating them now [5.339031] Irregular sequence file: are you sure your Sequence and Roadmap file come from the same source? [0.000000] Reading FastA file Velvet_Hybrid.fasta; [13.756826] 199270 sequences found

ADD REPLYlink written 2.1 years ago by Kenny20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1703 users visited in the last hour