Question: subsetting sets of scaffolds based on sequence similarity
0
gravatar for ramesh.8v
4.8 years ago by
ramesh.8v200
United States
ramesh.8v200 wrote:

Hi,

Couple of months back, I sequenced (MiSeq) few BACs and assembled (Paired-end reads) using SOAPdenovo, but my assembly was fallen into many scaffolds. Now, I got reference genome of the same cultivar and trying to pull out my interested region (about 3.2 Mb).

Here is first approach:

I mapped my Paired-end reads on whole genome using BWA. By this approach, only 19 scaffolds of whole genome got mapped.

 

Second apprach:

I blasted (blastn) SOAPdenovo assembly with whole genome (evalue: 1000, word size 40, percentage similarity: 100%). In this approach, more than 1500 scaffolds of whole genome got blast hits.

My question is, why this variation? Any problem with my mapping? Which is the best approach?

Or any other approach? Please share your experience guys!

EDIT: I am also thinking about reference guided re-assembly of my Paired-ends.

Thanks

Ramesh

sam bam ngs assembly • 933 views
ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by ramesh.8v200

Duplicate of Better statergy for Gap Closing

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by 5heikki8.6k

Not exactly. Here, my question is how to subset sequences? Not to close gaps?

I hope, you agree with me

ADD REPLYlink written 4.8 years ago by ramesh.8v200

If I were you, I would try out a few different assemblers. For example IDBA-Hybrid sounds like the perfect match for your problem. Before making it you might want to edit src/sequence/short_sequence.h for longer read length..

ADD REPLYlink written 4.8 years ago by 5heikki8.6k

Thank you! I will try this out.

ADD REPLYlink written 4.8 years ago by ramesh.8v200
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2213 users visited in the last hour