Question: Estimating genome size from k-mer histograms...
0
gravatar for ab.tsubaki
5.4 years ago by
ab.tsubaki50
South Africa
ab.tsubaki50 wrote:

Hi all

Anyone with experience using Jellyfish derived histograms?

I've done all the necessary bits and now I get to drawing up my histogram and it looks nothing like its supposed to! There's no peak or humps - it just starts at the top and slopes downward to flatten out at the bottom!

The scripts for running Jellyfish were as follows: 

jellyfish count -t 8 -C -m 19 -s 5G -o filename.jf read.fastq
jellyfish dump filename.jf > filename.fa
jellyfish histo -o filename.histo filename.jf

My Kmer value of 19 comes from values obtained by running KmerGenie.

I dumped the histo file into Excel to take a look at the histogram.

Can anyone spot a problem, or has encountered this before? Am I using the wrong Kmer size? Or is there an underlying problem with my sequencing data?

Thanks in advance

Anandi

genome size next-gen kmer • 2.8k views
ADD COMMENTlink modified 19 months ago by h.mon28k • written 5.4 years ago by ab.tsubaki50
1

I think you meant to link to an image.

ADD REPLYlink written 5.4 years ago by Matt Shirley9.2k
1
gravatar for Matt Shirley
5.4 years ago by
Matt Shirley9.2k
Cambridge, MA
Matt Shirley9.2k wrote:

See this nice writeup that covers genome size estimation among other things: https://banana-slug.soe.ucsc.edu/bioinformatic_tools:jellyfish

ADD COMMENTlink written 5.4 years ago by Matt Shirley9.2k

Thanks Matt. That IS a very useful site. I've taken a look. 

I tried attaching an image but couldn't figure out how. The main problem is that my histogram's shape does not lend itself to genome size estimation at all. You can describe the graph as "monotonically descending, exponential decay kmer histogram"!

 

I'm trying to generate histograms from some different kmer sizes now...

ADD REPLYlink written 5.4 years ago by ab.tsubaki50

Histograms might not be a great idea if you are binning the data at all. Try plotting a density or just plotting lines between points. You might be missing your peak if you're binning say with a width of 5 or so.

ADD REPLYlink written 5.4 years ago by Matt Shirley9.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2160 users visited in the last hour