Question

what is third column in the log file output for pacBioToCA

0

Entering edit mode

8.7 years ago

Medhat 9.7k

I used pacBioToCA to correct some long reads LR using illumina short reads SR

from the log file there is this info:

INPUT_NAME OUTPUT_NAME SUBREAD START END LENGTH
S1_1217 ec_pacbio_2015073_1 1 2 521 519
S1_1217 ec_pacbio_2015073_13 13 4401 5139 738
S1_1218 ec_pacbio_2015074_1 1 2 882 880
S1_1219 ec_pacbio_2015075_4 4 551 1107 556

What is the subread number , start and end columns

Does the subread mean that the program divided the original read to this number of reads? For example in the second row does ec_pacbio_2015073_13 contain 13 subreads?

Original I posted this question in seqanswers but I did not get any answer so I posted here http://seqanswers.com/forums/showthread.php?t=62069

UPDATE

I think the answer to my question is this option

-maxGap <int> The maximum uncorrected PacBio gap that will be allowed. When there is no short-read coverage for a region, by default the pipeline will split a PacBio sequence. This option will attempt to use other PacBio sequences to patch the gap and avoid splitting the read. Sequences where the gaps have no support will still be broken. For example, specifying 50, will mean any gap 50bp or smaller can have no short-read coverage (but has other PacBio sequence support) without splitting the PacBio sequence. Warning: this can allow more sequences that went through the SMRTbell to not be fixed.

for this I think the original pacbio sequence which was not covered by short reads was split

next-gen sequence Assembly pacbio • 1.6k views

ADD COMMENT • link updated 24 months ago by Ram 43k • written 8.7 years ago by Medhat 9.7k

Ram · Answer 1 · 2015-08-24

0

Entering edit mode

8.7 years ago

User 59 13k

You may find the PacBio terminology page helpful.

ADD COMMENT • link updated 24 months ago by Ram 43k • written 8.7 years ago by User 59 13k

0

Entering edit mode

I know the meaning of the word subread in the pacbio context but the issue is for example If I sorted this log file I will have

S1_1    ec_pacbio_2013857_1     1       32      547     515
S1_1    ec_pacbio_2013857_3     3       755     2357    1602
S1_2    ec_pacbio_2013858_2     2       244     815     571
S1_2    ec_pacbio_2013858_3     3       968     1616    648
S1_2    ec_pacbio_2013858_5     5       1872    2711    839

For sample S1_1 the bigest number is 3 , so I will assume that there is three subreads but I only found two

S1_1    ec_pacbio_2013857_1     1       32      547     515
S1_1    ec_pacbio_2013857_3     3       755     2357    1602

ADD REPLY • link updated 24 months ago by Ram 43k • written 8.7 years ago by Medhat 9.7k