I used pacBioToCA to correct some long reads LR using illumina short reads SR
from the log file there is this info:
INPUT_NAME OUTPUT_NAME SUBREAD START END LENGTH
S1_1217 ec_pacbio_2015073_1 1 2 521 519
S1_1217 ec_pacbio_2015073_13 13 4401 5139 738
S1_1218 ec_pacbio_2015074_1 1 2 882 880
S1_1219 ec_pacbio_2015075_4 4 551 1107 556
What is the subread number , start and end columns
Does the subread mean that the program divided the original read to this number of reads? For example in the second row does ec_pacbio_2015073_13
contain 13 subreads?
Original I posted this question in seqanswers but I did not get any answer so I posted here http://seqanswers.com/forums/showthread.php?t=62069
UPDATE
I think the answer to my question is this option
-maxGap <int> The maximum uncorrected PacBio gap that will be allowed. When there is no short-read coverage for a region, by default the pipeline will split a PacBio sequence. This option will attempt to use other PacBio sequences to patch the gap and avoid splitting the read. Sequences where the gaps have no support will still be broken. For example, specifying 50, will mean any gap 50bp or smaller can have no short-read coverage (but has other PacBio sequence support) without splitting the PacBio sequence. Warning: this can allow more sequences that went through the SMRTbell to not be fixed.
for this I think the original pacbio sequence which was not covered by short reads was split
I know the meaning of the word subread in the pacbio context but the issue is for example If I sorted this log file I will have
For sample
S1_1
the bigest number is 3 , so I will assume that there is three subreads but I only found two