I have some troubles calculating the number of reads that I should expect for a target sequence (for instance a trasposon) integrated into the human genome. That is: how many reads should I expect to map to my target sequence and confirm the presence of the target?
Assuming: 1) a pre-calculated coverage of 20, 2) a target region of 1000 bp and 3) a fixed length read of 150 bp and using the formula C=NL/G i get:
N=CG/L=20 x 1000 / 150 = 133 reads
this looks a bit too many reads. Or should I calculate using the whole human sequence, since the target is integrated into it? in that case, I get:
N=20 x 3 000 000 000 / 150 = 400 000 000 reads
that is clearly wrong.
My question is, therefore: how do I calculate the coverage in general and for integrated sequences in particular? Thank you