Question

Improving existing assembly using PacBIo reads

0

Entering edit mode

6.6 years ago

bhagyathimmappa ▴ 10

Hello all,

I am new to Bioinformatics field, I have an assembly available for one of the fungi (14.5mb) but this one is not end to end assembly. we have got the PacBio sequencing done for the same strain of an organism.

a. I wan to check the quality of these reads (something like fastqc).
b. I want to improve the existing assembly using PacBio long reads.
c. calculate N50 value for the final assembly.

Please forgive me if the question is already asked in some other forum, I tried my best to get the answer.

Thanks a lot in advance :)

Bhagya C T

Assembly alignment sequencing • 2.5k views

ADD COMMENT • link updated 10 months ago by Ram 43k • written 6.6 years ago by bhagyathimmappa ▴ 10

0

Entering edit mode

This recent tutorial may be of great help to you: Polish PacBio assembly with latest PacBio tools : an affordable solution for everyone

ADD REPLY • link 6.6 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank you Kevin.

Nice one, but it explains only about polishing, I want to know how to improve the scafolding using PacBio long reads.

ADD REPLY • link 6.6 years ago by bhagyathimmappa ▴ 10

0

Entering edit mode

Okay, I would ask the person who created that tutorial, Roxane I believe, as she appears to have been working in that area for the past few years. Apologies that I cannot help further.

ADD REPLY • link 6.6 years ago by Kevin Blighe 87k

score 0 · Answer 1 · 2017-09-23

a) use Canu to do a pacbio assembly. It gives you a html with output about the pacbio read quality. BBmap stats.sh or readlength.sh will give you great stats on the reads.

b) tell us more about the stats of the pacbio sequencing. You're probably better doing an entirely new assembly with pacbio alone, then using any existing reads to do polishing of the pacbio assembly with Pilon after running Canu.

c) Again use stats.sh from a.

Good luck. Canu/bbmap can be easily installed using bioconda

score 0 · Answer 2 · 2017-09-23

Dear colindaven, I have used canu and tried to assemble, here is the global stats.

PARAMETERS:

 40 (expected coverage)
  0 (don't use overlaps shorter than this)

0.000 (don't use overlaps with erate less than this) 1.000 (don't use overlaps with erate more than this)

OVERLAPS:

IGNORED:

       0 (< 0.0000 fraction error)
       0 (> 0.4095 fraction error)
       0 (< 0 bases long)
       0 (> 2097151 bases long)

FILTERED:

12147773 (too many overlaps, discard these shortest ones)

EVIDENCE:

 2071295 (longest overlaps)

TOTAL:

14219068 (all overlaps)

READS:

      66 (no overlaps)
   11145 (no overlaps filtered)
   22024 (<  50% overlaps filtered)
   33266 (<  80% overlaps filtered)
   37123 (<  95% overlaps filtered)
   43712 (< 100% overlaps filtered)

I do not think PacBio alone will give me good assembly, what I got from canu PacBio assembly is 120 contigs where as existing assembly has only 24 contigs. Based on the PacBio read stats I thought It is reasonable to try filling the gaps.

Please give me your inputs so that I can take it forward.

Thanks Bhagya C T