I ran GenomeScope to try to estimate the level of heterozygosity in my genome, however, the output plot looks quite strange and most alarming, is the incredibly large estimated genome size (I am expecting a genome of ~5MB and getting 120MB), so I am not sure if I can trust the reported heterozygosity value. Has anyone ever experienced this before and can offer any suggestions on what could be happening here? Some more information: I have 200bp paired end reads and pretty high coverage.
Link to plot:
All the code I used to get the plot:
jellyfish count -C -m 21 -s 5000000000 -t 8 R1.fastq -o reads.jf jellyfish histo -t 8 reads.jf > reads.histo Rscript genomescope.R reads.histo 21 200 results_out 700
More of the GenomeScope output:
len:120MB uniq:0.43% het:2.97% kcov:13.3 err:0.143% dup:0.39% k:21