Question

How to report coverage at overlapping regions of paired-end reads?

0

Entering edit mode

4.3 years ago

francois ▴ 80

I have seen the question briefly discussed in specific contexts, but as it seems like there is no consensus between different tools on this issue I think it would be useful to report here what the best practice is.

So I have Illumina MiSeq paired-end reads. Let's say a region is covered by a single pair of reads (one forward/one reverse), should the coverage in the overlapping region reported as 1x or 2x? What are the arguments for each?

Here is an illustration here: https://bioinformatics.stackexchange.com/questions/5427/double-counting-coverage-of-overlapped-read-pairs/5473

illumina paired-end • 1.2k views

ADD COMMENT • link updated 4.3 years ago by h.mon 35k • written 4.3 years ago by francois ▴ 80

1

Entering edit mode

Two reads from a pair are sequencing/sampling a single unique fragment.

That said the fragment was sequenced twice to generate the two reads so if you think of coverage as number of times a region was sequenced then that would be 2.

ADD REPLY • link 4.3 years ago by GenoMax 141k

score 2 · Accepted Answer · 2020-01-14

According to the "Sequencing Coverage Calculation Methods for Human Whole-Genome Sequencing" Illumina technote, sequencing coverage is “the average coverage of unique reads across the non-N portion of the human genome.” This definition excludes not only counting overlapping regions 2x (they should be counted 1x), it also excludes PCR and optical duplicates from the calculation as well.

I think this is the most sensible definition, as we use the coverage information to know if we likely have enough information to perform downstream analyses, and sequencing the same molecule over and over again does not add independent information. That said, Illumina itself ignores the above definition, as the footnotes from table 1 indicate:

b. The Isaac WGS App v1.0 excludes overlapping bases from only

one read (ie, overlapping bases are counted one time).

c. The BWA WGS App does not exclude overlapping bases from

either read (ie, overlapping bases are counted twice).