calculate coverage per bait
1
0
Entering edit mode
10.3 years ago
bioguy24 ▴ 230

In order to calculate coverage per bait is the calculation: bait reads / (bait length 100)? That seems a bit off when I put real #'s to it (158 / (153100) = 0.01. Thank you :).

EDIT:

( read count * read length ) / length of area in question

so if the average reads for a bait were 158 at a read length of 150 and the bait length was 153

(158 * 150) / 153 = 155x

exome-sequencing coverage • 2.8k views
ADD COMMENT
0
Entering edit mode

so what's the problem now?

ADD REPLY
0
Entering edit mode

I use an awk script to calculate the average number of reads and the length of each bait. Below are 14 baits that all map to the PTPN11 gene. So, since I am new to exome analysis, is it safe to assume that for bait one (158 * 150) / 153 = 155x and bait 2 (220 * 150) / 225 =147x bait 3 (228 * 150) / 223 = 153x. Would it be more useful to calculate the average coverage per bait as well as the average coverage per gene? So is the example lets say PTPN11 had 3 baits in it and the coverage (155, 147, and 153) mean that PTPN11 has 152x coverage. Thanks :).

chr12:112884064-112884217    153    158.209150
chr12:112888106-112888331    225    220.533333
chr12:112890983-112891206    223    228.286996
chr12:112892352-112892499    147    182.102041
chr12:112893738-112893882    144    202.076389
chr12:112910732-112910859    127    0.000000
chr12:112915439-112915549    110    82.590909
chr12:112915645-112915834    189    217.269841
chr12:112919862-112920024    162    279.586420
chr12:112924263-112924448    185    452.535135
chr12:112926231-112926329    98    162.234694
chr12:112926812-112926994    182    189.131868
chr12:112939932-112940075    143    291.020979
chr12:112942483-112942583    100    160.510000
ADD REPLY
0
Entering edit mode

I am not quite sure where this is going.

Would it be more useful to calculate the average coverage per bait as wel as the average coverage per gene?

More useful, for what?

ADD REPLY
1
Entering edit mode
10.3 years ago
Michael 56k

Not fully sure what you are asking here, and what it has to do with awk, where you got this formula from or why you divide by 100, but I think you forgot that reads also have a length. E.g. a read of length 100 covers up to 100 bases, if it overlaps fully with the bait; so that gives you ~153 fold coverage of the short sequence, it might be better to get a coverage estimate from averaging over the coverage per base in case that many reads overlap partially.

ADD COMMENT
0
Entering edit mode

I am going to use Picard tools (I am new to exome data and learning about the tools that are used). Thank you :).

ADD REPLY

Login before adding your answer.

Traffic: 4741 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6