Ordering admixture stacked barplot based on multiple values
0
0
Entering edit mode
4.1 years ago
msul • 0

I have a dataset from which I am constructing a stacked barplot in R, and I want to know how I can arrange the stacked barplot where "similar" individuals cluster together. My dataset is an admixture proportions dataset Q. Here is the dataset which is a d-by-n matrix. In this toy dataset, there are d=10 ancestral populations and n = 5 individuals:

Here is my dataset construction:

> a
            V1          V2          V3           V4           V5
1  0.534410243 0.009358740 0.011295181 0.2141751740 0.0030129254
2  0.026653603 0.372426720 0.447847534 0.0179177507 0.4072904477
3  0.193317915 0.003605024 0.003186611 0.4832114736 0.0007095471
4  0.111881585 0.000000000 0.000000000 0.2296213741 0.0119233461
5  0.089696570 0.591163629 0.509774416 0.0032542030 0.5535847030
6  0.007543558 0.000000000 0.000000000 0.0364907757 0.0013148362
7  0.004862942 0.000000000 0.002123909 0.0146682272 0.0004053690
8  0.009276195 0.011710457 0.014367894 0.0000000000 0.0000000000
9  0.006903171 0.004314528 0.011404455 0.0000000000 0.0126889937
10 0.015454219 0.007420903 0.000000000 0.0006610215 0.0090698319

I create a stacked barplot like so:

pop <- rownames(a)
a <- a %>% mutate(pop = rownames(a))
a_long <- gather(a, key, value, -pop)

# trying to create a similarity index
a_long <- a_long %>% group_by(key) %>% 
  mutate(mean = mean(value)) %>%
  arrange(desc(mean))

# looking at some of the expanded dataset
> a_long[1:20,]
# A tibble: 20 x 4
# Groups:   key [2]
   pop   key      value  mean
   <chr> <chr>    <dbl> <dbl>
 1 1     V2    0.00936    0.1
 2 2     V2    0.372      0.1
 3 3     V2    0.00361    0.1
 4 4     V2    0          0.1
 5 5     V2    0.591      0.1
 6 6     V2    0          0.1
 7 7     V2    0          0.1
 8 8     V2    0.0117     0.1
 9 9     V2    0.00431    0.1
10 10    V2    0.00742    0.1
11 1     V4    0.214      0.1
12 2     V4    0.0179     0.1
13 3     V4    0.483      0.1
14 4     V4    0.230      0.1
15 5     V4    0.00325    0.1
16 6     V4    0.0365     0.1
17 7     V4    0.0147     0.1
18 8     V4    0          0.1
19 9     V4    0          0.1
20 10    V4    0.000661   0.1

# colors
v_colors <- c("#440154FF", "#443B84FF", "#34618DFF", "#404588FF", "#1FA088FF", "#40BC72FF",
              "#67CC5CFF", "#A9DB33FF", "#DDE318FF", "#FDE725FF")

plot <- ggplot(a_long, aes(x = key, y = value, fill = pop)) 
plot + geom_bar(position="stack", stat="identity") +  scale_fill_manual(values = v_colors)

The output looks like this:

barplot

How can I make the output look more neat, e.g. with the individuals with higher proportion of population 5 ancestry be next to each other on the x-axis? So far, I have tried to compute the "mean" of value of each individual, but it didn't work since it's not a good measure. How can I create a similarity index that tells me how similar individual 1 is to individual 2, and then how do I order it them on the x-axis so that they look well-clustered (e.g. like the barplots in this figure)?

Thanks!

In case you want to recreate the data frame a in the example above:

v1 = c(0.534410243, 0.026653603, 0.193317915, 0.111881585, 0.089696570, 0.007543558, 0.004862942, 0.009276195, 0.006903171, 0.015454219)
v2 = c(0.009358740, 0.372426720, 0.003605024, 0.000000000, 0.591163629, 0.000000000, 0.000000000, 0.011710457, 0.004314528, 0.007420903)
v3 = c(0.011295181, 0.447847534, 0.003186611, 0.000000000, 0.509774416, 0.000000000, 0.002123909, 0.014367894, 0.011404455, 0.000000000) 
v4 = c(0.2141751740, 0.0179177507, 0.4832114736, 0.2296213741, 0.0032542030, 0.0364907757, 0.0146682272, 0.0000000000, 0.0000000000, 0.0006610215)
v5 = c(0.0030129254, 0.4072904477, 0.0007095471, 0.0119233461, 0.5535847030, 0.0013148362, 0.0004053690, 0.0000000000, 0.0126889937, 0.0090698319)
a = data.frame(V1 = v1, V2 = v2, V3 = v3, V4 = v4, V5 = v5)
R barplot stacked-barplot ordering genome • 1.9k views
ADD COMMENT

Login before adding your answer.

Traffic: 1388 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6