Jellyfish: every other kmer count is zero
0
0
Entering edit mode
4.5 years ago
Lina F ▴ 200

Hi all,

I found a tutorial suggesting how to use Jellyfish to estimate genome size:

http://koke.asrc.kanazawa-u.ac.jp/HOWTO/kmer-genomesize.html

However, after running jellyfish count and jellyfish histo the output shows that every other kmer count is zero.

Below is my code, trying several values of k.

I feel like I'm missing something simple -- why are the odd k-mer counts zero?

~Lina

for K in 21 23 25 27 29 31;
do
jellyfish count -t 20 -C -m $K -s 5G -o output_${K}.jf --min-quality=20 --quality-start=33 all.fastq
jellyfish histo -f output_${K}.jf > histogram_${K}.txt
jellyfish stats -v -o stats_${K}.txt output_${K}.jf
done

0 0
1 0
2 14028836
3 0
4 2053267
5 0
6 966831
7 0
8 554663
9 0

cat stats_31.txt
Unique:    0
Distinct:  37557758
Total:     2901177252
Max_count: 2419076


Edited to add the contents of the stats file.

kmer counting jellyfish genome size estimation • 2.5k views
3
Entering edit mode

You also have no k-mers with frequency of 1, which is extremely unlikely. Did you somehow doubled up your input fastq? Did you copy the original fastq at some point and concatenated the copy to the original?

0
Entering edit mode

I double checked and I did not double up my input fastq files. However, I am using both fwd and rev read files. In total I have 29.5 million read pairs. Should I downsample this?

EDITED to add: I just ran the code with only the FWD read files and now I get 1mers and odd kmers in general.

I realized my input data was wrong (my R1 and my R2 files were indeed the same, they were just given to me with different names)