Hi, I have downloaded and aligned PacBio HIFI NA12878 sample against GRCh38 genome, using Minimap2.
java -jar $PICARD CollectWgsMetrics I=NA12878_CCS_sorted.bam O=collect_wgs_metrics.txt R=$ref_fasta
Then I use picard collectwgsmetrics to calculate the coverage, I found that the coverage is 0.0. Then I used samtools flagstat and got this result:
$ samtools flagstate NA12878_CCS_sorted.bam 14410787 + 0 in total (QC-passed reads + QC-failed reads) 4878353 + 0 secondary 450400 + 0 supplementary 0 + 0 duplicates 14396335 + 0 mapped (99.90% : N/A) 0 + 0 paired in sequencing 0 + 0 read1 0 + 0 read2 0 + 0 properly paired (N/A : N/A) 0 + 0 with itself and mate mapped 0 + 0 singletons (N/A : N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)
It seems like all of the reads are mapped, but how come they have zero coverage? I also checked the alignment manually, and found many read have 0 flag (this one is very subjective).
How did those 2 tools have quite different result? And what is the correct way to calculate coverage of long read sample?