How Can I Do Scaffolding With The Single End Data ?
4
1
Entering edit mode
11.2 years ago

hello... i did assembly of reads and i got the contigs To this i used MIRA software. Now i wanted to do the scaffolding. i thought to use SSPACE or Bambus tools. i think both needs paired end data but my data is single end data. SSPACE needs a library.txt file i created a library.txt file with only one filename that was generated from the output of MIRA assembly(unpadded.fasta file). But it did not worked. Please can you help me ? thank you.

ngs contigs scaffolding • 6.3k views
ADD COMMENT
5
Entering edit mode
11.2 years ago

Hi, I am the developer of SSPACE and wanted to give a comment on the OP's post. You CAN make paired-end data from your single read data, but it is only valuable if you have a significant read length. Say you have 454 data, you can make a paired-end library by taking the start (read1) and end (read2) of the long read. Reverse complement read2 and you have a Illumina paired-end read-like dataset. Since you know the length of the 454 reads, you also know the insert size.

ADD COMMENT
2
Entering edit mode

+1 This is clever, I had not thought of this approach. However, I wonder what one would actually gain from doing this. My intuition is that just mapping the long reads to the contigs and/or assembling the long reads as is would be a more direct approach (i.e., leading to a more contiguous assembly though not technically scaffolds). I would also be worried about the mapping ability of these fake pairs since the fake reverse read would contain more errors.

ADD REPLY
1
Entering edit mode

my data is generated from ion torrent PGM. what is the meaning of significant read length ? is it platform dependent? my longest read length was 127507.

ADD REPLY
1
Entering edit mode

Significant read length means that you can generate the fake pairs from both ends of the read, and still have a gap between the reads that will be useful for scaffolding (probably want at least 2X the read length).

Is that read length a typo? If you have 127 Kb reads, why are you worried about scaffolding? :-)

ADD REPLY
4
Entering edit mode
11.2 years ago
Michael 54k

I think you cannot make that work, you say it yourself: both needs paired end data but my data is single end data. You cannot do scaffolding without that extra bit of information to connect contigs into scaffolds. Unfortunately, the answer is that you need to do additional sequencing to scaffold, e.g. paired-end and fosmid sequences or other large-insert clones. Traditionally, BAC-end sequences have been used to achieve sequence pairs. http://www.cbcb.umd.edu/research/assembly_primer.shtml#scaffolding

For completeness: there might be a few possible ways, but they all require additional sequences or information. These possibilities should be considered in the planning phase of the sequencing approach (optimally, at least ;) and not after finding that available data don't meet the requirements)

  • some sort of physical or linkage map to map contigs to chromosomes
  • use closely related species for a reference guided assembly (mainly for really closely related species or strains)
  • maybe one could use sequence composition criteria to assign neighboring contigs, but that wouldn't give you any distance information, I personally don't think, it is worth the effort, as getting paired-end sequences should be reasonably cheap
  • ???
ADD COMMENT
2
Entering edit mode

Aren't the contigs enough for your project? Perhaps the scaffolding is not necessary since you already have your contigs anyway.

ADD REPLY
1
Entering edit mode

Good point! with you 'must' do additional sequencing I meant, in case scaffolds are really required. This depends totally on what OP wants to do with the sequences.

ADD REPLY
1
Entering edit mode

i have four samples each having the contigs of 113,87,151,129 respectively. but i need to do the scaffolding.

ADD REPLY
1
Entering edit mode

thank you.... is there any other tool/software for scaffolding the single end data ? or is that necessary to do the additional sequencing ? please help me ...

ADD REPLY
1
Entering edit mode

You must do additional sequencing. There is no way software can help you without that. There is simply no information in single end sequences that allows to join multiple contigs into a scaffold.

ADD REPLY
1
Entering edit mode
11.2 years ago

in any software is there any option to convert the single end data into paired end data ?

ADD COMMENT
2
Entering edit mode

I am very sorry, but to repeat myself: the information is not there, there is NO WAY. single-end reads contain no information about pairs, otherwise they would be paired-end! Thus they cannot be 'converted'. If there was a way to 'infer' pair information, it would be some sort of "magic" or "alchemy", and I don't believe in that.

ADD REPLY
1
Entering edit mode

thank you sir for giving a valuable suggestions ...

ADD REPLY

Login before adding your answer.

Traffic: 2130 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6