how to calculate the proportion of SNPs with DP more than 5
1
0
Entering edit mode
3.5 years ago
Ana ▴ 180

Hi all, I have extracted the depth of coverage of some of my populations from the vcf-file and each population has 11 individuals (columns) with 11million SNPs(rows) . I have converted them into data.frame and replaced missing values with NA. The first few rows of my data.frame looks like this:

   > head(pop1)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
1  7  3 NA NA 10 NA NA NA NA  NA  NA
2 14 11  7 NA 12  3  4  5 14   3   6
3 13 11  7 NA 11  4 NA  4 13   3   4
4  3 NA  4  5  4 NA NA  6 17  NA   7
5  3 NA  5  5  4 NA NA  7 20  NA   8
6  6 NA  3  6 NA NA NA  5 16  NA  10


For each column (or individual), I want to calculate the proportion of SNPs that have DP more than 5! I am a bit confused how to do it in R! I now there are so many R professionals here, can someone help me how to do it in R?

r depth of coverage • 999 views
0
Entering edit mode
3.5 years ago

Dear Ana,

This code will do it for your entire data-frame:

table(pop1>5)[[2]] / (nrow(pop1) * ncol(pop1))


For each individual:

apply(pop1, 2, function(x) sum(x>5, na.rm=TRUE)) / apply(pop1, 2, function(x) length(x))


These include NAs in the tabulations

NB - I edited this a few times. There are undoubtedly other solutions

1
Entering edit mode

Thanks @Kevin Blighe, I used something like this which worked

prop_x<-sapply(ind_1, function(x) sum(x > 5, na.rm = TRUE)/length(x))