Question: how to assign kmer value for paired end reads
0
gravatar for girijakaushal
3.0 years ago by
girijakaushal10 wrote:

Hello

I am trying to find optimal kmer value using kmergenie so as to use that kmer value in Metavelevet tool for metagenome assembly.

Q. 1) My first query is related to size of my left and right read's file after trimming sequences with Phred score<20. So, When I did trimming of such reads it reduced the orignal size of my read files as earlier my both left and right reads were 10,014 MiB but after using NGSqc toolkit trimming tool, My left read file was 9412 MiB and right reads file was 9288 MiB. So do it is any problem? why their file size is different as earlier they were same.

Q. 2) As I am having paired end reads of 102bp so I firstly calculated for left reads that gave me best k=25 and then I calculated it for right read and it gave me beat k=21. So what kmer value should I take to assemble these left and right reads?

Please guide me, I would be heartily thankful.

Best regards

assembly • 1.3k views
ADD COMMENTlink modified 3.0 years ago by dbrowne.up60 • written 3.0 years ago by girijakaushal10

I'd use a program like prinseq-lite.pl to make sure that reads are properly paired. Following command can do that, it will also remove read shorter than the length 10.

prinseq-lite.pl -fastq file_R1.fastq -fastq2 file_R2.fastq -min_len 10 -out_bad null -out_good file_clean

You can re-run kmergenie on the prinseq output files. Also you should check out other de novo assembly tools like SPAdes and IDBA-UD that can use multiple k-mer values for the de novo assembly.

ADD REPLYlink written 3.0 years ago by Sej Modha4.3k
0
gravatar for dbrowne.up
3.0 years ago by
dbrowne.up60
United States
dbrowne.up60 wrote:

Personally, I don't think that a single k-mer value will give you an optimal assembly. Different k-mer sizes will optimally assemble different regions of the genome. Since you're doing metagenomic assembly, check out this program called MeGAMerge: https://github.com/LANL-Bioinformatics/MeGAMerge

It utilizes assemblies from multiple k-mer values, as well as long reads, if you have them, to build an improved draft genome assembly.

ADD COMMENTlink written 3.0 years ago by dbrowne.up60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1201 users visited in the last hour