Question: How To Count K-Mers On Solid Data?
gravatar for Alice
6.9 years ago by
Alice300 wrote:

Hello, biostars!

I downloaded some public SOLiD data files in sra and fastq format. All files, i guess, are from cfasta+qual merged together. I want to count k-mers with jellyfish software and don't know if i need to somehow convert solid sra\fastq in fastq\fasta with nucleotides. For mapping there are suggestions not to convert because of potential numerous mistakes in reads. I thought, there is one simple way: convert data in cfasta+qual and than convert cfasta in ordinary fasta. Am I right? Fortunately, there are many scripts for conversion.

fastq solid • 1.9k views
ADD COMMENTlink modified 6.9 years ago by Damian Kao15k • written 6.9 years ago by Alice300
gravatar for Damian Kao
6.9 years ago by
Damian Kao15k
Damian Kao15k wrote:

The problem with converting color-space to base-space is that if just one color-space is incorrect, the rest of the base-space will be wrong. Keeping the sequences in color-space actually retains more information. In terms of k-mer count, you might see a lot more unique k-mers than expected due to this problem. Or it might not matter at all depending on the error rate and position bias.

Do you know if the dataset has been error corrected with SAET? I would recommend just converting the reads to pseudo-base-space and running a k-mer count on that. So replace 0,1,2,3 with A,T,G,C.

ADD COMMENTlink written 6.9 years ago by Damian Kao15k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 809 users visited in the last hour