I want to write a program that converts SAM files to genome coverage (so wiggle or bedgraph format). So, my question is related to prrocessing the output of the aligner. My program would work a little bit like the genomeCoverageBed function in bedtools
genomeCoverageBed -bg -d -ibam reads.bam -g genome.csv
Now, it's reasonably straightforward to scan through a SAM file and pick out the strand and location of a tag. The length of the read can be inferred. Of course, one will often know the length of the reads anyway.
My question is how to handle the hard/soft clipping in terms of the length of the tag. Presumably, taking the clipping into account would mean dropping a few bases at the start or the end, thus having a shorter read. This would affect the tag count totals in the output to my function.
In DNA-seq, it seems like it doesn't make much sense to take clipping into account, because the location of the read is what mattered. Any feedback would be appreciated.