Scaffolding: Is This Legit?
3
0
Entering edit mode
11.6 years ago
Lee Katz ★ 3.2k

Hi all, I was just wondering if this would be a legitimate strategy before putting in a lot of effort. Or if it's been done before.

I have a many, many single end runs of Illumina. Most are 1x70bp. I was thinking that if I split each of the raw reads into a 2x35 set, then maybe a scaffolder like SOPRA could help me bridge gaps. Is this crazy? If it's not crazy, are there any recommendations?

scaffolding assembly • 2.6k views
ADD COMMENT
0
Entering edit mode

why do you want to do additional scaffolding? is your assembly fragmented?

ADD REPLY
0
Entering edit mode

Forgive my ignorance here, but how is breaking down a read into two 35 bp regions without a gap different than using the whole 70 bp? I was under the impression that SOPRA and like-minded assemblers relied on differential spacing between paired-end reads?

ADD REPLY
0
Entering edit mode

I think that the scenario here is that it is _my_ ignorance

ADD REPLY
5
Entering edit mode
11.6 years ago
Leszek 4.2k

I don't think it will help, as created this way paired-reads won't bring any new information for your assembly.
You will have to get new library with insert size of 600 bases or mate-pair in order to improve the assembly...

ADD COMMENT
2
Entering edit mode
11.6 years ago
Michael 55k

Aggree with Leszek. Don't invest any time into this. To understand why, imagine the answers to following questions:

  • scaffolding is for connecting and orienting contigs, inserting N if the sequence information of the insert is unknown. What is the insert size of your fake "pairs"? And according to this, which distance could then exist for 2 correctly scaffolded contigs joint by such pair?
  • what does it mean for the assembly process for these neighboring contigs which have significant coverage at both their ends, such that they would be joint into a scaffold according to this coverage bridging them?
ADD COMMENT
1
Entering edit mode
11.6 years ago
SES 8.6k

I agree with Leszek in that you would need longer reads for this approach to be useful. However, I am skeptical about whether splitting up long reads into fake pairs would improve things more than just assembling with the long reads, but I think it is worth investigating. This approach is described in How Can I Do Scaffolding With The Single End Data ? to a similar question (and I think my comment to that answer has some things worth considering).

ADD COMMENT
0
Entering edit mode

Ah, on some of the genomes, there are 454 reads. A comment in that discussion by the developer of SSPACE says that it might provide some useful information.

However, most comments in this thread and that thread are pretty convincing that there will be no useful information added in virtually all cases. I might not continue with this.

ADD REPLY

Login before adding your answer.

Traffic: 1703 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6