what is third column in the log file output for pacBioToCA
1
0
Entering edit mode
8.7 years ago
Medhat 9.7k

I used pacBioToCA to correct some long reads LR using illumina short reads SR

from the log file there is this info:

INPUT_NAME OUTPUT_NAME SUBREAD START END LENGTH
S1_1217 ec_pacbio_2015073_1 1 2 521 519
S1_1217 ec_pacbio_2015073_13 13 4401 5139 738
S1_1218 ec_pacbio_2015074_1 1 2 882 880
S1_1219 ec_pacbio_2015075_4 4 551 1107 556

What is the subread number , start and end columns

Does the subread mean that the program divided the original read to this number of reads? For example in the second row does ec_pacbio_2015073_13 contain 13 subreads?

​Original I posted this question in seqanswers but I did not get any answer so I posted here http://seqanswers.com/forums/showthread.php?t=62069

UPDATE

I think the answer to my question is this option

-maxGap <int> The maximum uncorrected PacBio gap that will be allowed. When there is no short-read coverage for a region, by default the pipeline will split a PacBio sequence. This option will attempt to use other PacBio sequences to patch the gap and avoid splitting the read. Sequences where the gaps have no support will still be broken. For example, specifying 50, will mean any gap 50bp or smaller can have no short-read coverage (but has other PacBio sequence support) without splitting the PacBio sequence. Warning: this can allow more sequences that went through the SMRTbell to not be fixed.

for this I think the original pacbio sequence which was not covered by short reads was split

next-gen sequence Assembly pacbio • 1.6k views
ADD COMMENT
0
Entering edit mode
8.7 years ago
User 59 13k

You may find the PacBio terminology page helpful.

ADD COMMENT
0
Entering edit mode

I know the meaning of the word subread in the pacbio context but the issue is for example If I sorted this log file I will have

S1_1    ec_pacbio_2013857_1     1       32      547     515
S1_1    ec_pacbio_2013857_3     3       755     2357    1602
S1_2    ec_pacbio_2013858_2     2       244     815     571
S1_2    ec_pacbio_2013858_3     3       968     1616    648
S1_2    ec_pacbio_2013858_5     5       1872    2711    839

For sample S1_1 the bigest number is 3 , so I will assume that there is three subreads but I only found two

S1_1    ec_pacbio_2013857_1     1       32      547     515
S1_1    ec_pacbio_2013857_3     3       755     2357    1602
ADD REPLY

Login before adding your answer.

Traffic: 3404 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6