Question: If A Read Is Clipped, What Is The Preferred Way To Make Tag Counts?
gravatar for KCC
6.2 years ago by
Cambridge, MA
KCC3.9k wrote:

I want to write a program that converts SAM files to genome coverage (so wiggle or bedgraph format). So, my question is related to prrocessing the output of the aligner. My program would work a little bit like the genomeCoverageBed function in bedtools

genomeCoverageBed -bg -d -ibam reads.bam -g genome.csv

However, I wouldn't have to do the extra step of translating from SAM to BAM.

Now, it's reasonably straightforward to scan through a SAM file and pick out the strand and location of a tag. The length of the read can be inferred. Of course, one will often know the length of the reads anyway.

My question is how to handle the hard/soft clipping in terms of the length of the tag. Presumably, taking the clipping into account would mean dropping a few bases at the start or the end, thus having a shorter read. This would affect the tag count totals in the output to my function.

In DNA-seq, it seems like it doesn't make much sense to take clipping into account, because the location of the read is what mattered. Any feedback would be appreciated.

seq • 2.1k views
ADD COMMENTlink modified 5.2 years ago by Biostar ♦♦ 20 • written 6.2 years ago by KCC3.9k

I think of read clipping as something that is done by the aligner. Perhaps you are talking about read trimming (prior to alignment)? Could you clarify?

ADD REPLYlink written 6.2 years ago by Sean Davis25k
gravatar for Istvan Albert
6.2 years ago by
Istvan Albert ♦♦ 80k
University Park, USA
Istvan Albert ♦♦ 80k wrote:

IMO if the read is clipped then the section that was clipped did not cover the genome, so it should not be accounted for in the coverage or in any other manner. I would treat it as if that particular read was shorter.

ADD COMMENTlink written 6.2 years ago by Istvan Albert ♦♦ 80k

I was thinking that at least in DNA-seq, we want to place the fragment. What mechanisms would cause edges of the read not to map? If this mechanism is a corruption of these bases then we could still use the number of bases to figure out how far the edge of the fragment extends. If these bases are bases appended to the edge of the read, then the number of bases is useless information.

ADD REPLYlink modified 6.2 years ago • written 6.2 years ago by KCC3.9k

genomic structural variations would be the simplest and most likely explanation.

But even if the cause of clipping were incorrectly called bases or other errors you should not extend them because with that you generate data that later you cannot distinguish from actually measured values.

ADD REPLYlink written 6.2 years ago by Istvan Albert ♦♦ 80k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 789 users visited in the last hour