If I have more than
40% are marked duplicate in my RNASeq and discarded by MuTect2 and another
40% not primary alignments.
This is how the duplication looks from fastqc: https://ibb.co/hEPoJH The figure says
43% seq will remain if deduplicated.
I make a coverage profile and it is here, the genome hg38 assembly is covered only by
This is a sample of the distribution I draw its curve:
0 2644954237 1 86037238 2 173531033 3 38889212 4 77790438 5 23056365 6 37055662 7 14282143 8 19123921 9 9226812 10 10713451
All these bases (2644954237) are covered by zero reads ! I mean I count how many reads cover each base in the reference. I feel this is very low coverage and very high duplication, is this normal with RNASeq or can be explained?