Problem: kmergenie has a minimum number of input of reads?
0
0
Entering edit mode
8.1 years ago

Hi all, I'm the following problem with kmergenie:

Warning! using max number of read files (2000)
error opening file: #
fitting model to histograms to estimate best k
could not predict a best k value
Execution of decide failed (return code 0)


How can I increase the read number of input??

Other question is how much of memory I need to run kmergenie?

Thanks,
Leandro

K-mers Kmergenie • 3.5k views
0
Entering edit mode

Hi!

What command line did you use? Also, what operating system?

0
Entering edit mode

Dear Rayan, I used : ./kmergenie *.csfasta

Operating systems: Biolinux (last version)

I have installed the R and python.

0
Entering edit mode

Thanks. How many *.csfasta files do you have? Kmergenie indeed has a limit on the number of input files (2000), as mentioned in the error. You could try merging them, like this:

cat *.csfasta > all.fasta


then run

./kmergenie all.fasta

0
Entering edit mode

Ah also: Kmergenie doesn't (yet) work with an input like *.fasta. (it might in the future; right now is version 1.6950)

If you have a list of fasta, please do the following:

ls -1 *.fasta > reads_list.txt

0
Entering edit mode

Hi, I use this command it doesn't work. Instead, it shows:

wp@debian:~/Downloads/kmergenie-1.6950\$ ./kmergenie ~/data/list
running histogram estimation
File /home/wp/data/list starts with character "R", hence is interpreted as a list of file names
error opening file: R1_001.fastq
fitting model to histograms to estimate best k
could not predict a best k value
Execution of decide failed (return code 0)


here is my list file:

R1_001.fastq
R1_002.fastq
R2_001.fastq
R2_002.fastq

0
Entering edit mode

This looks like a working directory problem. The ~/data/list file does not seem to contain absolute paths, thus you need to run kmergenie inside the ~/data/ folder.

0
Entering edit mode

Thanks a lot :)

0
Entering edit mode

Hi Rayal,

I have a side question, would you please clarify it for me:
if I run kmergenie for a pair-end read set (contain Read 1 and Read 2 fastq files), do I need to find a way to translate Read 2 into its compliment sequence before combining with Read 1 for kmergenie run? (because all sequence infomation in Read 2 is compliment to Read 1). If NOT, would it double the number of distinct kmer in statistical calculation of kmergenie? Overall, what we want to know is only 01 single strain of DNA only, isn't it?

Sorry, I am very new to this field. Thank you very much in advance!
Phuong

0
Entering edit mode

Hi Phuong,

No need. Kmergenie does not care if a read is in forward or reverse orientation, also does not care about reads are paired-end or single-end or mate-pairs. Just input all the fastq files that you would give to an assembler, in any order.

It won't double the number of kmers, as, kmergenie considers that a kmer and its reverse complement are the same object.

0
Entering edit mode

Thank you very much, this really enlightens me, especially the fact that kmergenie considers a kmer and its reverse complement are the same object.

0
Entering edit mode

Hi Hian, but I used only one input, I have one file.

0
Entering edit mode

I see.. Can you please paste the output of the following commands?

ls -1 *.csfasta


(By the way, Biostars encourages that you respond in a reply, not in a separate response, which is reserved for when an answer to the original problem is found)