I've been trying to understand PacBio data for a while now, I think that it's the way forward for our lab. I'm still having trouble understanding CCS though - my biology is quite shaky.
Could someone explain to a computer scientist - how does PacBio get its CCS from the subreads or long reads ?
Our collaborators with the PacBio data gave us a set of files - with subreads.fastq, CCS.fastq and long_reads.fastq. When I ran a FASTQC report for subreads, I got base quality scores of around 10-15, whereas with CCS I got a larger variation for quality scores with values between 30-40. Is this expected ?
Which of these files should I use to perform assembly ? I'm guessing it's CCS.fastq
Thanks for your help!