kmer counting for heterozygosity estimation
Entering edit mode
22 months ago
el97004 ▴ 40

Hi all,

I want to count kmers in my sequencing reads inorder to be able to estimate heterozygosity of my genome using Genomescope. I have paired end reads (R1.fastq, R2.fastq), I ran Jellyfish to count kmers using the following settings:

jellyfish count -C -m 21 -s 5000000 -t 8 R1.fastq -o reads.jf

but open further thinking, I realized I should maybe incorporate R2.fastq as well. so I ran the command as such:

jellyfish count -C -m 21 -s 5000000 -t 8 R*.fastq -o reads.jf

this works, but the resulting heterozygosity values from Genomescope differ. I was wondering if anyone had some input on the right way to count kmers in paired end sequencing data for the purpose of downstream heterozygosity estimations.

Any ideas are greatly appreciated. Thank you!

jellyfish kmer heterozygosity genome assembly • 750 views
Entering edit mode
22 months ago

First of all, if you investigate the quality of your forward and reverse reads, you will find the poor quality of reverse reads as compared to forward. So, when you calculate k-mers frequencies for both reads it will be different (and being just imaginary if you cluster forward and reverse reads on basis of k-mer frequencies, the mates will fall apart from each other due to poor quality in reverse reads). I suggest, if its supports your aim you can merge the pairs (assemble) first and then calculate k-mer frequencies. I had used compSeq from emboss toolkit for calculation of k-mers.


Login before adding your answer.

Traffic: 1734 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6