Short Read Simulator: Simple Math For Coverage ?
1
4
Entering edit mode
12.4 years ago

I'm writing my own short read read simulator (mostly for fun, I known I'm reinventing the wheel).

The 'beta' source code is available here: http://code.google.com/p/variationtoolkit/source/browse/trunk/src/shortreadsim.cpp

Here is my question:

  • each short read has a length = 'short_ read_length'
  • a gene has a length = 'segLength'

how many short-reads do I need to create to have an average coverage = 'N' for this gene ? (simple math ?)

next-gen sequencing simulation short • 2.1k views
ADD COMMENT
5
Entering edit mode
12.4 years ago
Michael 54k

ahem, yes very simple I guess, given all read come only from this gene, that's what you want right?

let r be the read length, g be the gene length, r<=g then each read contributes coverage c = r/g. So you need N/(r/g) reads. E.g. if r=g and aimed coverage is 1 then you need 1 read. Let alone those reads that do not fully overlap the gene, they contribute ofc only partially. If you want to include these into the calculation and want the exact coverage, you can only do it after you placed each read randomly and then recompute the coverage.

Or did I miss something completely??

ADD COMMENT
0
Entering edit mode

Thanks Michael, I must be very tired... :-)

ADD REPLY

Login before adding your answer.

Traffic: 2050 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6