Question: How to do coverage counting for each genomic position in R ?
gravatar for MAPK
4.1 years ago by
United States
MAPK1.4k wrote:

Hi Guys,
I am using pileup in Rsamtool from R package. I need to get  the total "count" for each position. For example, for chr11, position 1643082, there are a total of 1 count of A, 3 counts of C and 18 counts of T. I want them in this order:

chr11:1643082  A:1, C:3, T:18

and so forth for position chr11, 47663948 to be shown in the result. There are thousands of different positions and I need to calculate the counts for each nucleotide.Thanks!


seqnames       pos strand nucleotide count               which_label
1     chr11   1643082      +          A     1     chr11:1643082-1643082
2     chr11   1643082      -          C     3     chr11:1643082-1643082
3     chr11   1643082      +          T    15     chr11:1643082-1643082
4     chr11   1643082      -          T     3     chr11:1643082-1643082
5     chr11  47663948      +          C    16   chr11:47663948-47663948
6     chr11  47663948      -          C    11   chr11:47663948-47663948
7     chr11  47663948      +          T     2   chr11:47663948-47663948
R • 1.4k views
ADD COMMENTlink modified 3.4 years ago by Biostar ♦♦ 20 • written 4.1 years ago by MAPK1.4k
gravatar for PoGibas
4.1 years ago by
PoGibas4.8k wrote:

You can do this easily with data.table package.

Something like this:

df[, sum(count), by=list(seqnames,pos,nucleotide)]
ADD COMMENTlink written 4.1 years ago by PoGibas4.8k

Thank you for your help!

ADD REPLYlink written 4.1 years ago by MAPK1.4k
gravatar for Giovanni M Dall'Olio
4.1 years ago by
London, UK
Giovanni M Dall'Olio26k wrote:

The alternative using dplyr:

df %>%
   group_by(seqnames,pos,nucleotide) %>%


Edit: to get the exact output you described, you have to:

> biost %>% 
    group_by(seqnames,pos,nucleotide) %>% 
    mutate(mypos=paste(seqnames, pos, sep=':')) %>%
    group_by(mypos) %>% 
    arrange(nucleotide) %>%   # Make sure nucleotides are always sorted the same way
    summarise(counts=paste(nucleotide, count, sep=':',collapse=', '))

Source: local data frame [2 x 2]

           mypos         counts
1  chr11:1643082 A:1, C:3, T:18
2 chr11:47663948      C:27, T:2
ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by Giovanni M Dall'Olio26k

Thank you, Giovanni!

ADD REPLYlink written 4.1 years ago by MAPK1.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 927 users visited in the last hour