Question: kmer counting for heterozygosity estimation
gravatar for el97004
8 months ago by
el9700410 wrote:

Hi all,

I want to count kmers in my sequencing reads inorder to be able to estimate heterozygosity of my genome using Genomescope. I have paired end reads (R1.fastq, R2.fastq), I ran Jellyfish to count kmers using the following settings:

jellyfish count -C -m 21 -s 5000000 -t 8 R1.fastq -o reads.jf

but open further thinking, I realized I should maybe incorporate R2.fastq as well. so I ran the command as such:

jellyfish count -C -m 21 -s 5000000 -t 8 R*.fastq -o reads.jf

this works, but the resulting heterozygosity values from Genomescope differ. I was wondering if anyone had some input on the right way to count kmers in paired end sequencing data for the purpose of downstream heterozygosity estimations.

Any ideas are greatly appreciated. Thank you!

ADD COMMENTlink modified 8 months ago by Dattatray Mongad350 • written 8 months ago by el9700410
gravatar for Dattatray Mongad
8 months ago by
National Centre for Cell Science, Pune
Dattatray Mongad350 wrote:

First of all, if you investigate the quality of your forward and reverse reads, you will find the poor quality of reverse reads as compared to forward. So, when you calculate k-mers frequencies for both reads it will be different (and being just imaginary if you cluster forward and reverse reads on basis of k-mer frequencies, the mates will fall apart from each other due to poor quality in reverse reads). I suggest, if its supports your aim you can merge the pairs (assemble) first and then calculate k-mer frequencies. I had used compSeq from emboss toolkit for calculation of k-mers.

ADD COMMENTlink written 8 months ago by Dattatray Mongad350
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2054 users visited in the last hour