I want to count kmers in my sequencing reads inorder to be able to estimate heterozygosity of my genome using Genomescope. I have paired end reads (R1.fastq, R2.fastq), I ran Jellyfish to count kmers using the following settings:
jellyfish count -C -m 21 -s 5000000 -t 8 R1.fastq -o reads.jf
but open further thinking, I realized I should maybe incorporate R2.fastq as well. so I ran the command as such:
jellyfish count -C -m 21 -s 5000000 -t 8 R*.fastq -o reads.jf
this works, but the resulting heterozygosity values from Genomescope differ. I was wondering if anyone had some input on the right way to count kmers in paired end sequencing data for the purpose of downstream heterozygosity estimations.
Any ideas are greatly appreciated. Thank you!