I have to calculate the coverage for human WGS of illumina sequenced read. After reading the technique note of illumina I have some doubts in WGS coverage calculation of human sequence.
( https://www.illumina.com/content/dam/illumina-marketing/documents/products/technotes/hiseq-x-30x-coverage-technical-note-770-2014-042.pdf), it talks about the "the average coverage of unique reads".
As far I know, the formula for calculating the sequence Coverage for WGS: Coverage =( total reads * length of read * 2 )/ length of genome sequenced. Whether there is any other the formula used for WGS coverage calculation? if so what is the difference strategy used by illumina platform for calculating coverage for WGS?
As I said before, after reading the technical Note of illumina (pdf from the link given above), in this pdf it says [Illumina defines sequencing coverage as “the average coverage of unique reads across the non-N portion of the human genome.”] My understanding of unique read is, "the read which mapped only once in a genome with a given number of mismatches" (please correct if my understanding wrong or limited). Could any one give an explanation of how the coverage is calculated for unique reads? I think that some time the adapter region may be assumed to calculate as unique reads? Is it so ?
Whether I have to remove duplicated before WGS coverage calculation?
whether anyone have a link or supporting document how Illumina is calculating the coverage for WGS of human?
What is the unique read enrichment? What is the important role of unique reads in WGS coverage calculation?
Any leads will be highly appreciated.
Thank in advance.