Question: Scaffolding: Is This Legit?
gravatar for Lee Katz
5.3 years ago by
Lee Katz2.9k
Atlanta, GA
Lee Katz2.9k wrote:

Hi all, I was just wondering if this would be a legitimate strategy before putting in a lot of effort. Or if it's been done before.

I have a many, many single end runs of Illumina. Most are 1x70bp. I was thinking that if I split each of the raw reads into a 2x35 set, then maybe a scaffolder like SOPRA could help me bridge gaps. Is this crazy? If it's not crazy, are there any recommendations?

assembly scaffolding • 1.3k views
ADD COMMENTlink modified 5.3 years ago by Michael Dondrup44k • written 5.3 years ago by Lee Katz2.9k

why do you want to do additional scaffolding? is your assembly fragmented?

ADD REPLYlink written 5.3 years ago by Leszek3.9k

Forgive my ignorance here, but how is breaking down a read into two 35 bp regions without a gap different than using the whole 70 bp? I was under the impression that SOPRA and like-minded assemblers relied on differential spacing between paired-end reads?

ADD REPLYlink written 5.3 years ago by Josh Herr5.6k

I think that the scenario here is that it is _my_ ignorance

ADD REPLYlink written 5.3 years ago by Lee Katz2.9k
gravatar for Leszek
5.3 years ago by
IIMCB, Poland
Leszek3.9k wrote:

I don't think it will help, as created this way paired-reads won't bring any new information for your assembly.
You will have to get new library with insert size of 600 bases or mate-pair in order to improve the assembly...

ADD COMMENTlink written 5.3 years ago by Leszek3.9k
gravatar for Michael Dondrup
5.3 years ago by
Bergen, Norway
Michael Dondrup44k wrote:

Aggree with Leszek. Don't invest any time into this. To understand why, imagine the answers to following questions:

  • scaffolding is for connecting and orienting contigs, inserting N if the sequence information of the insert is unknown. What is the insert size of your fake "pairs"? And according to this, which distance could then exist for 2 correctly scaffolded contigs joint by such pair?
  • what does it mean for the assembly process for these neighboring contigs which have significant coverage at both their ends, such that they would be joint into a scaffold according to this coverage bridging them?
ADD COMMENTlink written 5.3 years ago by Michael Dondrup44k
gravatar for SES
5.3 years ago by
Vancouver, BC
SES8.1k wrote:

I agree with Leszek in that you would need longer reads for this approach to be useful. However, I am skeptical about whether splitting up long reads into fake pairs would improve things more than just assembling with the long reads, but I think it is worth investigating. This approach is described in How Can I Do Scaffolding With The Single End Data ? to a similar question (and I think my comment to that answer has some things worth considering).

ADD COMMENTlink written 5.3 years ago by SES8.1k

Ah, on some of the genomes, there are 454 reads. A comment in that discussion by the developer of SSPACE says that it might provide some useful information.

However, most comments in this thread and that thread are pretty convincing that there will be no useful information added in virtually all cases. I might not continue with this.

ADD REPLYlink modified 5.3 years ago • written 5.3 years ago by Lee Katz2.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1462 users visited in the last hour