subsetting sets of scaffolds based on sequence similarity

0

Entering edit mode

9.0 years ago

second_exon ▴ 210

Hi,

Couple of months back, I sequenced (MiSeq) few BACs and assembled (Paired-end reads) using SOAPdenovo, but my assembly was fallen into many scaffolds. Now, I got reference genome of the same cultivar and trying to pull out my interested region (about 3.2 Mb).

Here is first approach:

I mapped my Paired-end reads on whole genome using BWA. By this approach, only 19 scaffolds of whole genome got mapped.

Second approach:

I blasted (blastn) SOAPdenovo assembly with whole genome (evalue: 1000, word size 40, percentage similarity: 100%). In this approach, more than 1500 scaffolds of whole genome got blast hits.

My question is, why this variation? Any problem with my mapping? Which is the best approach?

Or any other approach? Please share your experience guys!

EDIT: I am also thinking about reference guided re-assembly of my Paired-ends.

Thanks
Ramesh

sam NGS bam Assembly • 1.5k views

ADD COMMENT • link updated 14 months ago by Ram 43k • written 9.0 years ago by second_exon ▴ 210

0

Entering edit mode

Duplicate of Better statergy for Gap Closing

ADD REPLY • link updated 14 months ago by Ram 43k • written 9.0 years ago by 5heikki 11k

0

Entering edit mode

Not exactly. Here, my question is how to subset sequences? Not to close gaps?

I hope, you agree with me

ADD REPLY • link 9.0 years ago by second_exon ▴ 210

0

Entering edit mode

If I were you, I would try out a few different assemblers. For example IDBA-Hybrid sounds like the perfect match for your problem. Before making it you might want to edit src/sequence/short_sequence.h for longer read length.

ADD REPLY • link updated 14 months ago by Ram 43k • written 9.0 years ago by 5heikki 11k