Genome Assembly With Physical Map
3
4
Entering edit mode
13.4 years ago
Plantae ▴ 390

Hi all

we have sequence a plant genome through SOLEXA, the reads were assembly into contigs or scaffolds using short read assembler like velvet/soapdenovo etc. however, the scaffold assembled by these assembler is very small, the max scaffold is about 200k in size. we have build a physical map through BAC-fingerprint techniques, and several BAC-END sequence is available for these mapped BACs.

Now i try to map the SOLEXA assembled scaffolds onto the FPC map, using scaffold to BAC-END sequence relationships.

This process is very complicate, i guess if there are tools that can do this job. which link scaffolds through physical map informations.

I found that one tool BAMBUS seems to fulfill my need, but this tool need sequence contig informations in ACE format, which is not available in my current project.

Any suggestions would be appreciated.

best regards!

genome assembly scaffolding • 4.7k views
ADD COMMENT
0
Entering edit mode

I'm grappling with a similar problem, although we are using other mapping resources such as synteny and a genetic linkage map to anchor the contigs to a genome. Other than BAMBUS, I'm not sure there is any turnkey application that can do this effectively. The [?]BAMBUS manual[?] does handle other [but limited] file types. Interested to know what others say.

ADD REPLY
0
Entering edit mode

I'm grappling with a similar problem, although we are using other mapping resources such as synteny and a genetic linkage map to anchor the contigs to a genome. Other than BAMBUS, I'm not sure there is any turnkey application that can do this effectively. The BAMBUS manual (http://goo.gl/xA5OR) does handle other [but limited] file types. Interested to know what others say.

ADD REPLY
3
Entering edit mode
13.4 years ago

Have you taken a look at the tools described in this paper?

Engler FW, Hatfield J, Nelson W, and Soderlund CA. Locating Sequence on FPC Maps and Selecting a Minimal Tiling Path. Genome Res. 2003. 13: 2152-2163

"This study discusses three software tools, the first two aid in integrating sequence with an FPC physical map and the third automatically selects a minimal tiling path given genomic draft sequence and BAC end sequences."

(someone correct me if I'm wrong, but I think you should not use contigs and scaffolds synonymously; contigs are still relatively short continuous sequences that your assembler created and scaffolds are created from paired end read data)

ADD COMMENT
0
Entering edit mode

I have try the FPC tool, the problem is that FPC tool is a graphic based tool. For large dataset, it is useless (if you do not have direct access to a linux cluster). Another problem for FPC is that only a few parameters can be altered, the mapping result is good only when both high density FPC map and long sequence scaffolds or contigs is available.

anyway, thanks for your answer.

ADD REPLY
2
Entering edit mode
13.4 years ago
Darked89 4.6k

There are bunch of issues here, and I will address just a few:

  • highly heterozygous genome? If you sequence something with large differences between copies of the chromosomes (check i.e. a tunicate Ciona), then even with long reads Sanger sequencing assembling such thing with whole genome shotgun is probably impossible.
  • do you still have $$$ to do additional sequencing, be it large insert Solexa lib or 454? You may not have reached "sweet spot" yet.
  • how repetitive is your plant genome? With small/medium inserts sequencing lib one can not bridge them
  • can you uniquely map significant portion of your BACs ends to your assembled genome? How often are these repetitive? If these ends map to short (say few kb) contigs/scaffolds then comparing restriction patterns of these with BACs will be not possible, I think.
  • as mentioned above (synteny etc.), you may try to improve your scaffold size by concentrating only on gene coding fragments. Map any ESTs you got plus proteins from close species and see if this improves things.
ADD COMMENT
2
Entering edit mode
13.4 years ago
Plantae ▴ 390

thanks for your comments.

how repetitive is your plant genome? With small/medium inserts sequencing lib one can not bridge them

This problem is really harmful to our current assembly process, the genome contain 30~40% repeats,we have generate solexa libraries with insert sizes 2/4/8 kb, the current assembly can achieve scaffold N50 size at about 5kb.

can you uniquely map significant portion of your BACs ends to your assembled genome? How often are these repetitive? If these ends map to short (say few kb) contigs/scaffolds then comparing restriction patterns of these with BACs will be not possible, I think.

the mapping result seems to be good, more than 30000 bac end sequences can be mapped uniquely, however, as you mentioned, comparing between bac fingerprint with scaffold/contigs is useless, because several scaffolds/contigs contained only 1 bac end sequences, thus anchoring of these scaffolds/contigs is ambiguous.

as mentioned above (synteny etc.), you may try to improve your scaffold size by concentrating only on gene coding fragments. Map any ESTs you got plus proteins from close species and see if this improves things.

This suggestion is fined, i will try it.

ADD COMMENT

Login before adding your answer.

Traffic: 2558 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6