Question: Ordering admixture stacked barplot based on multiple values
0
gravatar for msul
3 months ago by
msul0
msul0 wrote:

I have a dataset from which I am constructing a stacked barplot in R, and I want to know how I can arrange the stacked barplot where "similar" individuals cluster together. My dataset is an admixture proportions dataset Q. Here is the dataset which is a d-by-n matrix. In this toy dataset, there are d=10 ancestral populations and n = 5 individuals:

Here is my dataset construction:

> a
            V1          V2          V3           V4           V5
1  0.534410243 0.009358740 0.011295181 0.2141751740 0.0030129254
2  0.026653603 0.372426720 0.447847534 0.0179177507 0.4072904477
3  0.193317915 0.003605024 0.003186611 0.4832114736 0.0007095471
4  0.111881585 0.000000000 0.000000000 0.2296213741 0.0119233461
5  0.089696570 0.591163629 0.509774416 0.0032542030 0.5535847030
6  0.007543558 0.000000000 0.000000000 0.0364907757 0.0013148362
7  0.004862942 0.000000000 0.002123909 0.0146682272 0.0004053690
8  0.009276195 0.011710457 0.014367894 0.0000000000 0.0000000000
9  0.006903171 0.004314528 0.011404455 0.0000000000 0.0126889937
10 0.015454219 0.007420903 0.000000000 0.0006610215 0.0090698319

I create a stacked barplot like so:

pop <- rownames(a)
a <- a %>% mutate(pop = rownames(a))
a_long <- gather(a, key, value, -pop)

# trying to create a similarity index
a_long <- a_long %>% group_by(key) %>% 
  mutate(mean = mean(value)) %>%
  arrange(desc(mean))

# looking at some of the expanded dataset
> a_long[1:20,]
# A tibble: 20 x 4
# Groups:   key [2]
   pop   key      value  mean
   <chr> <chr>    <dbl> <dbl>
 1 1     V2    0.00936    0.1
 2 2     V2    0.372      0.1
 3 3     V2    0.00361    0.1
 4 4     V2    0          0.1
 5 5     V2    0.591      0.1
 6 6     V2    0          0.1
 7 7     V2    0          0.1
 8 8     V2    0.0117     0.1
 9 9     V2    0.00431    0.1
10 10    V2    0.00742    0.1
11 1     V4    0.214      0.1
12 2     V4    0.0179     0.1
13 3     V4    0.483      0.1
14 4     V4    0.230      0.1
15 5     V4    0.00325    0.1
16 6     V4    0.0365     0.1
17 7     V4    0.0147     0.1
18 8     V4    0          0.1
19 9     V4    0          0.1
20 10    V4    0.000661   0.1

# colors
v_colors <- c("#440154FF", "#443B84FF", "#34618DFF", "#404588FF", "#1FA088FF", "#40BC72FF",
              "#67CC5CFF", "#A9DB33FF", "#DDE318FF", "#FDE725FF")

plot <- ggplot(a_long, aes(x = key, y = value, fill = pop)) 
plot + geom_bar(position="stack", stat="identity") +  scale_fill_manual(values = v_colors)

The output looks like this:

barplot

How can I make the output look more neat, e.g. with the individuals with higher proportion of population 5 ancestry be next to each other on the x-axis? So far, I have tried to compute the "mean" of value of each individual, but it didn't work since it's not a good measure. How can I create a similarity index that tells me how similar individual 1 is to individual 2, and then how do I order it them on the x-axis so that they look well-clustered (e.g. like the barplots in this figure)?

Thanks!

In case you want to recreate the data frame a in the example above:

v1 = c(0.534410243, 0.026653603, 0.193317915, 0.111881585, 0.089696570, 0.007543558, 0.004862942, 0.009276195, 0.006903171, 0.015454219)
v2 = c(0.009358740, 0.372426720, 0.003605024, 0.000000000, 0.591163629, 0.000000000, 0.000000000, 0.011710457, 0.004314528, 0.007420903)
v3 = c(0.011295181, 0.447847534, 0.003186611, 0.000000000, 0.509774416, 0.000000000, 0.002123909, 0.014367894, 0.011404455, 0.000000000) 
v4 = c(0.2141751740, 0.0179177507, 0.4832114736, 0.2296213741, 0.0032542030, 0.0364907757, 0.0146682272, 0.0000000000, 0.0000000000, 0.0006610215)
v5 = c(0.0030129254, 0.4072904477, 0.0007095471, 0.0119233461, 0.5535847030, 0.0013148362, 0.0004053690, 0.0000000000, 0.0126889937, 0.0090698319)
a = data.frame(V1 = v1, V2 = v2, V3 = v3, V4 = v4, V5 = v5)
ADD COMMENTlink written 3 months ago by msul0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1669 users visited in the last hour