How to get coverage for specific base(s), and for specific mismatch(es) per genomic range
Entering edit mode
5 weeks ago


I would like to get coverage per a set of genomic ranges, with a little complication that I need coverage over T, G, A, and C provided separately. My idea to do it is first add feature names to mpileup file (with bedtools intersect), and then do something akin to R dyplyr::summarize(). But maybe there is bash alternative for that?

The second step of my struggle is to count specific mismatches based on the mpileup code. This I thought of doing in R, because I know how, but maybe someone could help me to get started with awk on that. A one-liner to count the number of occurrences of "g" in column 5 (see below) and print this number instead in the same column would help to get me started.

Of course, if there is a more efficient way to accomplish the task - let me know (I am sure there is)!

slam_500_spike  390 A   46  .,..,.,.g..,,g,.,,,,.g.,,,,,,,g,,....,,,,,,,.   FFFF:FFFFFFFFFJJFFFJFJJJJFJJJJFFJJJJJJJFFJJJJJ
RNA-Seq • 97 views

Login before adding your answer.

Traffic: 2456 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6