I'm trying to extract the IPD values for kinetics analysis from publicly available data (3 x bax.h5, bas.h5). However when using the R-kinetics scripts, I seem to be getting a response that there is no IPD data incorporated, although I explicitly state in my preparation steps that I do want them and do not get any message warnings that is not carried along.
The pipeline I'm using so far is:
bax2bam -o PREFIX file.1.bax.h5 file.2.bax.h5 file.3.bax.h5 --subread --pulsefeatures=DeletionQV,DeletionTag,InsertionQV,IPD,PulseWidth,MergeQV,SubstitutionQV,SubstitutionTag --losslessframes
blasr PREFIX.subreads.bam refGenome.fa --out file.bam
samtools sort file.bam file_sorted
samtools view -h -o file.sam file_sorted.bam
samtoh5 file.sam refGenome.fa file.cmp.h5
The version of programs I'm using:
bax2bam = v0.0.8
blasr = v.5.2
samtoh5 = v1.0.0.141782
After all this I get the cmp.h5 that I'm able to read in and view but there is not IPD details inside of it. I also used independently and on top of the cmp.h5 to to load the pulse data with loadPulse but I get message:
$ loadPulses movie_s1_p0.bas.h5 file.cmp.h5
[INFO] 2016-07-28T14:59:34 [loadPulses] started.
WARNING: There is insufficient data to compute metric: ClassifierQV in the file movie_s1_p0.1.bax.h5 It will be ignored.
WARNING: There is insufficient data to compute metric: pkmid in the file movie_s1_p0.1.bax.h5 It will be ignored.
loading 82011 alignments for movie 1
ERROR, the query sequence does not match the aligned query sequence.
HoleNumber: 32516, MovieName: movie_s1_p0, ReadIndex: 32516, qStart: 4404, qEnd: 9220
Aligned sequence:
TTGAAAGAAAA....
Original sequence:
GGTAAAACATT....
Now, there has to be something simple that I am missing. Any assistance would be appreciated.