Question: Reads length Distribution of ONT reads and Error rate
0
gravatar for midox
2.6 years ago by
midox240
Tunisia
midox240 wrote:

Hello,

I want to know the distribution of the reads and the error rate (mismatch, insertion, deletion) of the Nanopore Sequencing Technology.

Thank you

ADD COMMENTlink modified 2.6 years ago by WouterDeCoster42k • written 2.6 years ago by midox240
0
gravatar for Brian Bushnell
2.6 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

BBMap can report those statistics, though reads longer than 6kbp will be broken into 6kbp segments. For this purpose, that shouldn't matter, though it's worth noting that the shorter the fragments, the lower the apparent error rate is.

mapPacBio.sh in=reads.fastq maxlen=6000 out=mapped.sam ref=reference.fasta

That will report the rates of mismatches, insertions, and deletions.

ADD COMMENTlink written 2.6 years ago by Brian Bushnell17k

in the case of PacBio reads or i can also use Nanopore?

ADD REPLYlink written 2.6 years ago by midox240

The error rates of PacBio and Nanopore are similar (both extremely high) so I use the same error profile. Though in practice, Nanopore error rates seem to be much higher than PacBio, so to map as many reads as possible, you can reduce the kmer length and increase sensitivity. But... nobody really cares about reads with under 75% identity, so I'd just classify those as junk, and ignore them.

ADD REPLYlink written 2.6 years ago by Brian Bushnell17k

I am looking for a distribution that already exists of number of nanopore reads according to their lengths and the different error rates. If I can do it with BBmap I try to map the raw reads on the reference genome and extract this informations.

ADD REPLYlink written 2.6 years ago by midox240

You can't directly get a distribution of error rates by read length with BBMap with reads over 6kbp, since it chops reads. It's easy to post-process, though, since the chopped reads still contain their name.

ADD REPLYlink written 2.6 years ago by Brian Bushnell17k

Please brian. When I do mapping sequences on the reference genomes using BBmap I consider the overall mapping rate in "pct bases" ?? Thanks

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by midox240
1

@Brian referred to this recently here.

ADD REPLYlink written 2.6 years ago by genomax75k

Please brain.

Are you talking to Brian or your brain?

ADD REPLYlink written 2.6 years ago by WouterDeCoster42k

oups sorry. To Brian. and everyone

ADD REPLYlink written 2.6 years ago by midox240
0
gravatar for WouterDeCoster
2.6 years ago by
Belgium
WouterDeCoster42k wrote:

It's impossible to make an accurate statement about the distribution of read lengths, since these are dependent on the length of the input DNA and how careful you treat the library. With the most common library prep you have reads of 6-10kb, with a long tail up to ~200kb.
However, with an adapted protocol, very careful handling and old-school DNA extraction you can get reads of up to 970kb, as reported by Nick Loman and Josh Quick.

ADD COMMENTlink written 2.6 years ago by WouterDeCoster42k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1182 users visited in the last hour