Question: Convert Opgen To Paired End Reads
1
gravatar for Lee Katz
7.7 years ago by
Lee Katz2.9k
Atlanta, GA
Lee Katz2.9k wrote:

Hi all, I am using OpGen data, which I would say is a really fantastic aid for genome assembly. Below are a few data points (334 sites total). Using these data points, I was able to discover misassemblies from my automated assembly tools (e.g. Newbler). My overall question is, how can I automate my assembly using these high-quality data points?

My immediate solution is to artificially convert these sites into 6-mer paired end reads. For example the first data point below describes a restriction fragment that is 14867 bp. In other words there are two NheI sites 14867 bp away from each other. So, my immediate question is, how can I convert these sites into paired end reads? What is a paired end read file format that Newbler would accept? The restriction site is G^CTAGC.

Thank you for your help.

  <RESTRICTION_MAP ID="XYZ" ENZYME="NheI" INSILICO="false">
    <MAP_DISPLAY DBID="13" EDITABLE="false" STICK="false" X="10000" Y="149" TRANS="255" ORDER="1320" ORIENTATION="1" CIRCULAR="true" GROUPID="-1" />
      <FRAGMENTS SHIFT="0" OFFSET="1">
        <F I="0" S="14867" STDDEV="0.000" HIGHLIGHT="false" HIDE="false" GAP="false" />
        <F I="1" S="7731" STDDEV="0.000" HIGHLIGHT="false" HIDE="false" GAP="false" />
        <F I="2" S="9070" STDDEV="0.000" HIGHLIGHT="false" HIDE="false" GAP="false" />
        <F I="3" S="2016" STDDEV="0.000" HIGHLIGHT="false" HIDE="false" GAP="false" />
        <F I="4" S="3175" STDDEV="0.000" HIGHLIGHT="false" HIDE="false" GAP="false" />
        <F I="5" S="5418" STDDEV="0.000" HIGHLIGHT="false" HIDE="false" GAP="false" />
      </FRAGMENTS>

      <MAP_METRICS STRETCH="0" RECT_AVE="0.00" RECT_ALL="0.00" MID_STDDEV="0.00" R="0.00" WIGGLE="0.00" GAP_STDDEV="0.00" GAP_MAX="0" />
      <FEATURES>

      </FEATURES>

  </RESTRICTION_MAP>
assembly paired xml • 2.0k views
ADD COMMENTlink written 7.7 years ago by Lee Katz2.9k
4
gravatar for Haibao Tang
7.7 years ago by
Haibao Tang3.0k
Mountain View, CA
Haibao Tang3.0k wrote:

SOMA and MapSolver is providing you with contig ordering. However, based on your response to Jeremy's answer, it sounds that you want to fix the false contig joins by mate pairs. So my suggestion is to identify those false joins based on the MapSolver (use their GUI program to identify the location), and break the pairing of the bad mates that caused the false joins in the first place.

For example, in your scaffolds, you have contig1 and contig2 adjacent, but upon examination in MapSolver, contig2 might go to a different place (would appear as crossing lines in their plot). You can then identify the reads with one end on contig1 and the other end on contig_2. Then move them to the unpaired set.

If you don't have many contigs, moving the contigs manually using the OM guide would also be a viable solution.

It is worthwhile to also mention Bambus. In principle, they accept XML format that can be many different data types such as synteny and genetic/physical map, but it is not straightforward to use.

ADD COMMENTlink written 7.7 years ago by Haibao Tang3.0k

I wish I had mate pairs to correct! My assembly is based on single end reads. Your solution is a really good for a misassembly that involves paired end reads already. Bambus looks good too (one more tool to add to my toolbox!), but I would not know where to break my misassembled contig so that I could use it.

ADD REPLYlink written 7.7 years ago by Lee Katz2.9k

so you are saying that you have chimeric contigs? you can try to map your reads to your contigs, and look for the regions with low read coverage. remove those reads, reassemble. Bear in mind OM can also contain errors.

ADD REPLYlink written 7.7 years ago by Haibao Tang3.0k

That's a very good point that OpGen maps can contain errors. In a recent seminar at my institution, they discussed how they may eventually bring in confidence scores (or something approximating that), but for now they do not and I am considering them as high confidence. I have chimeric contigs, but I do not have an assembly file (ace, afg, etc) due to my comprehensive assembly process. However, I may choose to just use Newbler so that I have an ace file, and then use your method. That is a good idea.

ADD REPLYlink written 7.7 years ago by Lee Katz2.9k
1
gravatar for Jeremy Leipzig
7.7 years ago by
Philadelphia, PA
Jeremy Leipzig18k wrote:

I take it you've tried SOMA?[?] ftp://ftp.cbcb.umd.edu/pub/software/soma/

http://bioinformatics.oxfordjournals.org/content/24/10/1229.abstract[?] Niranjan Nagarajan, Timothy D. Read and Mihai Pop[?]Scaffolding and validation of bacterial genome assemblies using optical restriction maps

ADD COMMENTlink written 7.7 years ago by Jeremy Leipzig18k

The article is starting to look good to me! I will try it out.

ADD REPLYlink written 7.7 years ago by Lee Katz2.9k

Ok... it's good but not exactly what I am looking for. I want to avoid misassemblies by using OpGen data at the time of assembly. Using MapSolver and SOMA both look at an assembly and suggest contig ordering but do not fix a misassembly.

ADD REPLYlink written 7.7 years ago by Lee Katz2.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 693 users visited in the last hour