How to do coverage counting for each genomic position in R ?
2
3
Entering edit mode
6.5 years ago
MAPK ★ 1.9k

Hi Guys,
I am using pileup in Rsamtool from R package. I need to get  the total "count" for each position. For example, for chr11, position 1643082, there are a total of 1 count of A, 3 counts of C and 18 counts of T. I want them in this order:

chr11:1643082  A:1, C:3, T:18

and so forth for position chr11, 47663948 to be shown in the result. There are thousands of different positions and I need to calculate the counts for each nucleotide.Thanks!

table1 

seqnames       pos strand nucleotide count               which_label
1     chr11   1643082      +          A     1     chr11:1643082-1643082
2     chr11   1643082      -          C     3     chr11:1643082-1643082
3     chr11   1643082      +          T    15     chr11:1643082-1643082
4     chr11   1643082      -          T     3     chr11:1643082-1643082
5     chr11  47663948      +          C    16   chr11:47663948-47663948
6     chr11  47663948      -          C    11   chr11:47663948-47663948
7     chr11  47663948      +          T     2   chr11:47663948-47663948
R • 2.1k views
ADD COMMENT
9
Entering edit mode
6.5 years ago
PoGibas 5.0k

You can do this easily with data.table package.

Something like this:

df[, sum(count), by=list(seqnames,pos,nucleotide)]
ADD COMMENT
0
Entering edit mode

Thank you for your help!

ADD REPLY
7
Entering edit mode
6.5 years ago

The alternative using dplyr:

library(dplyr)
df %>%
   group_by(seqnames,pos,nucleotide) %>%
   summarise(count=sum(count))

 

Edit: to get the exact output you described, you have to:

> biost %>% 
    group_by(seqnames,pos,nucleotide) %>% 
    summarise(count=sum(count))%>% 
    mutate(mypos=paste(seqnames, pos, sep=':')) %>%
    group_by(mypos) %>% 
    arrange(nucleotide) %>%   # Make sure nucleotides are always sorted the same way
    summarise(counts=paste(nucleotide, count, sep=':',collapse=', '))

Source: local data frame [2 x 2]

           mypos         counts
1  chr11:1643082 A:1, C:3, T:18
2 chr11:47663948      C:27, T:2
ADD COMMENT
0
Entering edit mode

Thank you, Giovanni!

ADD REPLY

Login before adding your answer.

Traffic: 2285 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6