Canu assembling with multiple pacbio reads files at a single run
1
1
Entering edit mode
5.6 years ago

This question is because of I am not getting this information in the tool manual as well as any tutorial for help. Could I run canu assembler with input of multiple files of pacbio reads with step by step Correct, Trim and Assemble, Manually.

CORRECTION step:-

canu -correct -p plant -d run1 genomeSize=900m -pacbio-raw  1.fasta 2.fasta 3.fasta 4.fasta 5.fasta 6.fasta

output:

plant.correctedReads.fasta.gz

My Ques is:

Here, in this step assembler had generated only single corrected file for all 6 files. Is there any loss of data. 'OR' i have to do this step individually for all 6 files. like,

canu -correct -p readfile1 -d run1 genomeSize=900m -pacbio-raw  1.fasta
canu -correct -p readfile2 -d run1 genomeSize=900m -pacbio-raw  2.fasta
canu -correct -p readfile3 -d run1 genomeSize=900m -pacbio-raw  3.fasta
canu -correct -p readfile4 -d run1 genomeSize=900m -pacbio-raw  4.fasta
canu -correct -p readfile5 -d run1 genomeSize=900m -pacbio-raw  5.fasta
canu -correct -p readfile6 -d run1 genomeSize=900m -pacbio-raw  6.fasta

TRIMMING step:-

canu -trim -p plant -d run1 genomeSize=900m -pacbio-corrected plant.correctedReads.fasta.gz

output:

It is at running stage

'OR' i have to do this step like,

canu -trim -p readfile1 -d run1 genomeSize=900m -pacbio-corrected 1.correctedReads.fasta.gz
canu -trim -p readfile2 -d run1 genomeSize=900m -pacbio-corrected 2.correctedReads.fasta.gz
canu -trim -p readfile3 -d run1 genomeSize=900m -pacbio-corrected 3.correctedReads.fasta.gz
canu -trim -p readfile4 -d run1 genomeSize=900m -pacbio-corrected 4.correctedReads.fasta.gz
canu -trim -p readfile5 -d run1 genomeSize=900m -pacbio-corrected 5.correctedReads.fasta.gz
canu -trim -p readfile6 -d run1 genomeSize=900m -pacbio-corrected 6.correctedReads.fasta.gz

ASSEMBLY step:-

canu -assemble -p plant -d run1 genomeSize=900m correctedErrorRate=0.039 -pacbio-corrected plant.trimmedReads.fasta.gz

'OR'

canu -assemble -p plant -d run1 genomeSize=900m correctedErrorRate=0.039 -pacbio-corrected 1.trimmedReads.fasta.gz 2.trimmedReads.fasta.gz 3.trimmedReads.fasta.gz 4.trimmedReads.fasta.gz 5.trimmedReads.fasta.gz 6.trimmedReads.fasta.gz

My Ques is:

1) Is both (single run and individual run for correction and trimming) pipeline will generate same assembly or different.

2) Is there any loss of data with first pipeline (single run for correction and trimming).

Please assist, this would be appreciable.

genome Assembly canu • 3.5k views
ADD COMMENT
0
Entering edit mode

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY
0
Entering edit mode

Thanks, WouterDeCoster.

ADD REPLY
0
Entering edit mode

Many many thanks, Lieven for your valuable suggestion.

I agree with your points, but at the step of trimming, there I have only one file, resulted from correction step as a correctedReads.gz file. So, here in my case trimming part have the only option to run at once for that single file, no option for step-by-step.

One-by-one Trimming of reads will only possible when we do the correction step one-by-one for all individual read files.

Now, I have to see my results, with both of my steps. Let's us see what will be the differences in the result, I will share here.

Thanks again.

ADD REPLY
0
Entering edit mode

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLY
0
Entering edit mode

True, but you can easily split up the correctedReads file into a number of chunks, but anyway, Canu will do this itself normally (see my answer below)

ADD REPLY
1
Entering edit mode
5.6 years ago

In theory: use for every step all data at once

in step 1 "correction" it is kinda crucial that you throw in all data at once, to have an efficient correction of the reads.

in step 2 "trimming" you might consider doing them one by one

in step 3 "assembly" , again use everything at once

but so far for the theory. Often you will not have the resources to process the whole dataset at once (or at least in a timely manner) so then you have to subdivide the data within the different step. Yes, the result will be less optimal but at least you get a result!

moreover, there is still a difference between starting up the command and how that step will be processed, I know that canu already subdivides itself several steps, so it's not that you will see a single job doing one of those step, eg. the trimming part canu will create dozens of sub-jobs and in the end merges them.

ADD COMMENT

Login before adding your answer.

Traffic: 2748 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6