Question: Genome Size Estimation
0
gravatar for Mbillah
2.2 years ago by
Mbillah120
China
Mbillah120 wrote:

Recently I read a paper where they estimate their genome size by this formula: G = k-mer number/k-mer depth Where, k-mer num=52 413 427 492, K-mer depth = 17

Now, I want to estimate my genome size, where I have 123742407 paired reads and sequence length is 150 bp. if I select kmer as 51. What will be my estimate genome size?

genome • 2.1k views
ADD COMMENTlink written 2.2 years ago by Mbillah120
2

You can't work it out from that alone, til you work out your number of kmers.

Essentially they're taking the number of k-mers of a given length to provide a rough estimate of genome size, since as a genome gets larger, its number of unique k-mers should also increase.

ADD REPLYlink written 2.2 years ago by Joe18k

I have found other formula: Estimate genome size N = M * L / (L - K + 1)

N is Depth of Read Coverage

M is mean k-mer coverage

L is read length

K is k-mer size

G = T / N

G is the genome size

T is the total number of bases

In my side here, L=150, k=51 am I right? How can I find N and M?

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Mbillah120

You have to map your reads, you don’t yet have all the data you need to do this calculation.

ADD REPLYlink written 2.2 years ago by Joe18k

http://koke.asrc.kanazawa-u.ac.jp/HOWTO/kmer-genomesize.html
https://bioinformatics.uconn.edu/genome-size-estimation-tutorial/

ADD REPLYlink written 2.2 years ago by GenoMax96k

https://bioinformatics.uconn.edu/genome-size-estimation-tutorial/

n = [( L - k ) + 1 ] * C

n = [(150-51)+1]* 123742407 = 12374240700 N = n / C = 12374240700 / 123742407 = 100 Here, sequence length 150, k-mer length 51 and total read 123742407 What is my wrong ?

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Mbillah120
1

you don't need the number of reads.

what you need is the number of unique kmers in your input data.

You can count those by running software like, jellyfish, ntCard , ... once you have that count table I would suggest to use the GenomeScope website were all these estimate calculations are done for you.

If you want do do the calculation yourself, then you need to plot your kmer count table in a histogram, determine the X-value for the peak in it and use that in your formula

and read the links genomax provided for the detailed guideline of the above

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by lieven.sterck10.0k

Whee have you got this number from? 123742407

ADD REPLYlink written 2.2 years ago by Joe18k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2192 users visited in the last hour
_