Improving existing assembly using PacBIo reads
2
0
Entering edit mode
6.6 years ago

Hello all,

I am new to Bioinformatics field, I have an assembly available for one of the fungi (14.5mb) but this one is not end to end assembly. we have got the PacBio sequencing done for the same strain of an organism.

a. I wan to check the quality of these reads (something like fastqc).
b. I want to improve the existing assembly using PacBio long reads.
c. calculate N50 value for the final assembly.

Please forgive me if the question is already asked in some other forum, I tried my best to get the answer.

Thanks a lot in advance :)

Bhagya C T

Assembly alignment sequencing • 2.5k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Thank you Kevin.

Nice one, but it explains only about polishing, I want to know how to improve the scafolding using PacBio long reads.

ADD REPLY
0
Entering edit mode

Okay, I would ask the person who created that tutorial, Roxane I believe, as she appears to have been working in that area for the past few years. Apologies that I cannot help further.

ADD REPLY
0
Entering edit mode
6.6 years ago

a) use Canu to do a pacbio assembly. It gives you a html with output about the pacbio read quality. BBmap stats.sh or readlength.sh will give you great stats on the reads.

b) tell us more about the stats of the pacbio sequencing. You're probably better doing an entirely new assembly with pacbio alone, then using any existing reads to do polishing of the pacbio assembly with Pilon after running Canu.

c) Again use stats.sh from a.

Good luck. Canu/bbmap can be easily installed using bioconda

ADD COMMENT
0
Entering edit mode
6.6 years ago

Dear colindaven, I have used canu and tried to assemble, here is the global stats.

PARAMETERS:

 40 (expected coverage)
  0 (don't use overlaps shorter than this)

0.000 (don't use overlaps with erate less than this) 1.000 (don't use overlaps with erate more than this)

OVERLAPS:

IGNORED:

       0 (< 0.0000 fraction error)
       0 (> 0.4095 fraction error)
       0 (< 0 bases long)
       0 (> 2097151 bases long)

FILTERED:

12147773 (too many overlaps, discard these shortest ones)

EVIDENCE:

 2071295 (longest overlaps)

TOTAL:

14219068 (all overlaps)

READS:

      66 (no overlaps)
   11145 (no overlaps filtered)
   22024 (<  50% overlaps filtered)
   33266 (<  80% overlaps filtered)
   37123 (<  95% overlaps filtered)
   43712 (< 100% overlaps filtered)

I do not think PacBio alone will give me good assembly, what I got from canu PacBio assembly is 120 contigs where as existing assembly has only 24 contigs. Based on the PacBio read stats I thought It is reasonable to try filling the gaps.

Please give me your inputs so that I can take it forward.

Thanks Bhagya C T

ADD COMMENT
0
Entering edit mode

What pacbio coverage do you have ? Good Pacbio data + canu should assemble a genome of this size with ease.

ADD REPLY

Login before adding your answer.

Traffic: 2640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6