Entering edit mode
2.4 years ago
godth13teen ▴ 70
Hi, I have downloaded and aligned PacBio HIFI NA12878 sample against GRCh38 genome, using Minimap2.
java -jar $PICARD CollectWgsMetrics I=NA12878_CCS_sorted.bam O=collect_wgs_metrics.txt R=$ref_fasta
Then I use picard collectwgsmetrics to calculate the coverage, I found that the coverage is 0.0. Then I used samtools flagstat and got this result:
$ samtools flagstate NA12878_CCS_sorted.bam 14410787 + 0 in total (QC-passed reads + QC-failed reads) 4878353 + 0 secondary 450400 + 0 supplementary 0 + 0 duplicates 14396335 + 0 mapped (99.90% : N/A) 0 + 0 paired in sequencing 0 + 0 read1 0 + 0 read2 0 + 0 properly paired (N/A : N/A) 0 + 0 with itself and mate mapped 0 + 0 singletons (N/A : N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)
It seems like all of the reads are mapped, but how come they have zero coverage? I also checked the alignment manually, and found many read have 0 flag (this one is very subjective).
How did those 2 tools have quite different result? And what is the correct way to calculate coverage of long read sample?
Please post the command line of picard and the output. Zero (the flag) means mapped as single-end read, not unmapped, or are you referring to MAPQ?
Oh right, so I misinterpreted the flag of those reads. I also updated the command in my post
You could quickly put your BAM into
qualimapto see what this one outputs. It gives a nice graphic output with the number of bp covered at least x-fold.