I generated a kmer count file using jellyfish and subsequently a histogram, which when plotted in R gave the attached graph. I am confused about why I have a small peak at coverage 22. I see a similar tiny peak even for kmer values as high as 115.
- How does one interpret this for a genome expected to have 50-60% repeats.
- How can I extract reads pertaining to the tiny peaks?
- I am suspecting that I can correlate this with higher GC content in some reads, as you can see in the attached file generated by fastqc.
- Can I safely interpret this tiny peak as a non-erroneous peak and retain those kmers for assembly?