I am working with Pacbio sequel data for few bacterial strains. I have got 3 files from the sequencing facility
How to asses the quality of the data? Since, this is sequel data, the phred scores are arbitrariliy set to exclamation mark (phred score=0). There should be some way to asses the QC. PacBio suggest using SMRTlink program to asses the quality; I can also see from the user guide (page#28) that
.subreadset.xml file contains information about Sequel sequence data. From page#110, I know that I have all the files required by SMRTlink, however, I am not sure how to import these files into SMRTlink and asses the quality? I have already installed it on a windows machine and I am able to login.
Page#109 of the same user guide says that another file called
.sts.xml contains summary statistics about the collection/cell and its post-processing. I havent receive that file. Is it required for QC?
Do I have the files required for QC using SMRTlink?
Any alternative way to perform QC?
Update: the sequel machine is not with us. Can I still perform the QC having just the 3 files mentioned above?
I am trying to understand the pacbio sequencing chemistry. From the image below, it is clear that what I have got is the subreads (sequenced inserts devoid of the green adapters) and not CCS (circular consensus sequence). I am trying to assemble the bacteria genome with canu assembler.
image source: Pacbio
What should I use?
fastqfile converted from the
subreads.bamfile? I think most of the blogs suggest that.