I have a fastq file from PacBio SMRT sequencing, first I will map reads to a reference, then I want to replace the low phred score base(<30) with consensus base. Is there any algorithm can do this?
What is the rationale behind doing this and what is it that you want to ultimately do?
You could have saved some money and created simulated PacBio sequences using the reference.
Thanks, what I submitted for PacBio sequencing are PCR products of a variant pool with similarity of 99% in DNA. I want to identify true variants from the pacbio reads, so I want to use the phred score to determine the mismatch base is true or not. If you have any other method or algorithm can do this, please tell me?
Do you have consensus reads from PacBio data (now called "reads_of_insert" or what used to be called CCS reads)? If that is the case then there is good chance that the base you have in the read is real (and has support from multiple passes). Those may be the SNP's you are interested in.
Instead of replacing bases you should try to align (or cluster) your reads using the reference.
Yes, I have the CCS reads. Thank you very much.