Question: How To Count K-Mer For Color-Space Reads?
0
gravatar for GAO Yang
7.5 years ago by
GAO Yang250
GAO Yang250 wrote:

Hi,guys my former question was here http://www.biostars.org/post/show/45025/how-to-estimate-genome-size-using-k-mer-coverage/#45148 my problem is this jellyfish could only handle with fasta format files. But mine was color space reads (1234, two bases determined one color) generated by SOLiD platform. Does anybody know another tool for countering K-mer that may support color-space reads ? Or: I think although Color-space is differ from base, the K-mer multiplicity for same sequence will be the same.So if I directly do" tr/1234/acgt/ (double encode)", and use it as input , the jellyfish may produce same result as with the fasta format input? Am I right?

Thanks for your attention, any advice will be appreciated!

counts • 1.9k views
ADD COMMENTlink modified 7.5 years ago by Lee Katz3.0k • written 7.5 years ago by GAO Yang250

Yeah, double encoding should work.

ADD REPLYlink written 7.5 years ago by Damian Kao15k
1
gravatar for Istvan Albert
7.5 years ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:

(corrected)

I think it all comes down to what the purpose of kmer counting is.

Color space has a four fold redundancy, for example 000 could be AAAA, TTTT, CCCC or GGGG. This will affect your kmer representation and therefore I believe one cannot easily extrapolate from color space kmers to the actual number of sequence kmers.

You could transform to actual sequences, but that will lose some information.

http://www.biostars.org/post/show/43855/transforming-and-manipulating-color-space-reads/

ADD COMMENTlink written 7.5 years ago by Istvan Albert ♦♦ 81k

Yeah, But here I just want to estimate the genome size via K-mer counting. Do you think it'll do ?

ADD REPLYlink written 7.5 years ago by GAO Yang250

I don't think that anyone could just say yes to this. It all depends on the genome under study. Count kmers both ways, double encoded and after decoding to letter space, this may provide some further insights.

ADD REPLYlink written 7.5 years ago by Istvan Albert ♦♦ 81k

Convincing ! I'll try both ways. But I'm afraid directly translate to base may cause higher error rate, and then affect the K-mer counting

ADD REPLYlink written 7.5 years ago by GAO Yang250
0
gravatar for Lee Katz
7.5 years ago by
Lee Katz3.0k
Atlanta, GA
Lee Katz3.0k wrote:

You should probably work entirely in color space. Transform your reference genome into color space and then map against that.

ADD COMMENTlink written 7.5 years ago by Lee Katz3.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1590 users visited in the last hour