Estimating genome size from k-mer histograms...
1
0
Entering edit mode
8.4 years ago
ab.tsubaki ▴ 50

Hi all

Anyone with experience using Jellyfish derived histograms?

I've done all the necessary bits and now I get to drawing up my histogram and it looks nothing like its supposed to! There's no peak or humps - it just starts at the top and slopes downward to flatten out at the bottom!

The scripts for running Jellyfish were as follows:

jellyfish count -t 8 -C -m 19 -s 5G -o filename.jf read.fastq
jellyfish dump filename.jf > filename.fa
jellyfish histo -o filename.histo filename.jf


My Kmer value of 19 comes from values obtained by running KmerGenie.

I dumped the histo file into Excel to take a look at the histogram.

Can anyone spot a problem, or has encountered this before? Am I using the wrong Kmer size? Or is there an underlying problem with my sequencing data?

Anandi

kmer genome-size next-gen • 3.9k views
1
Entering edit mode

I think you meant to link to an image.

1
Entering edit mode
8.4 years ago

See this nice writeup that covers genome size estimation among other things: https://banana-slug.soe.ucsc.edu/bioinformatic_tools:jellyfish

0
Entering edit mode

Thanks Matt. That IS a very useful site. I've taken a look.

I tried attaching an image but couldn't figure out how. The main problem is that my histogram's shape does not lend itself to genome size estimation at all. You can describe the graph as "monotonically descending, exponential decay kmer histogram"!

I'm trying to generate histograms from some different kmer sizes now...

0
Entering edit mode

Histograms might not be a great idea if you are binning the data at all. Try plotting a density or just plotting lines between points. You might be missing your peak if you're binning say with a width of 5 or so.