Heterozygosity in k-mer histogram

0

Entering edit mode

4.6 years ago

jm440 ▴ 10

Hi!

I ran GenomeScope to try to estimate the level of heterozygosity in my genome, however, the output plot looks quite strange and most alarming, is the incredibly large estimated genome size (I am expecting a genome of ~5MB and getting 120MB), so I am not sure if I can trust the reported heterozygosity value. Has anyone ever experienced this before and can offer any suggestions on what could be happening here? Some more information: I have 200bp paired end reads and pretty high coverage.

Link to plot: here

All the code I used to get the plot:

jellyfish count -C -m 21 -s 5000000000 -t 8 R1.fastq -o reads.jf
jellyfish histo -t 8 reads.jf > reads.histo
Rscript genomescope.R reads.histo 21 200 results_out 700

More of the GenomeScope output:

len:120MB uniq:0.43% het:2.97% kcov:13.3 err:0.143% dup:0.39% k:21

Thank you

genomescope k-mer heterozygosity • 1.8k views

ADD COMMENT • link updated 4.6 years ago by h.mon 35k • written 4.6 years ago by jm440 ▴ 10

1

Entering edit mode

I have never used GenomeScope or jellyfish directly, but you say your expected genome size is ~5M bp, but you apparently entered 5G bp (nine zeros instead of 6). You could also compare your results to KAT (https://kat.readthedocs.io/en/latest/).

ADD REPLY • link 4.6 years ago by jean.elbers ★ 1.7k

Login before adding your answer.