I have aligned reads to a reference genome with BWA MEM and removed the duplicated reads with SAMBAMBA. I then used samtools to get a coverage distribution of the reads and used a script in R to cluster the coverage. The samtools command was:
samtools depth <aligned_deduplicated.bam> > <file.tsv>
which gives a column with the genomic position and one with the related coverage. By pooling the data with R, I got, for example for the stretch 367,853,177 to 367,853,698, an average of 124,391 reads per base. If I plot this region on IGV though, I get this:
The threshold values for the quality of alignment and mapping are both set to 50, but the figure does not change if I use 0, 1 or 30.
Am I right in assuming that the problem is about the quality of the reads provided by samtools? That is:
samtools depth provides all the read coverage for a given position regardless of the quality (both alignment and mapping) whereas IGV filters this information? Maybe the reads are hard clipped, that is why are not shown -- but then, how to show them?
And how can I then used
samtools depth with an associated quality value? That is, how can I add a third column to the
depth's output that reports the alignment (or mapping or better both) score?