Question: ADMIXTURE and R, color meaning on barplot in studing population ancestry, K value
1
dirranrak20 wrote:

Hi all, I advanced using ADMIXTURE after using Plink and I have got K=5, so I ploted the data with R using col=5. I get the barplot with each bar representing an individual but how can I recognize the color label such as the red color for example and/or which population? because the *.5.Q file just has row or column for the proportions and each individual. Thank you.

R gene • 9.2k views
modified 3 months ago by nataliagru150 • written 5.1 years ago by dirranrak20
1

The key to generating the canonical admixture plots in R is to used a stacked barplot with the coancestry coefficients. Is this what you're doing? It's not clear. This is the plot from myfile.5.Q using R. I try to find what does mean each color (ancestry population probably? and if it is the case, which population?).

Here is the way I did it:

``````> tbl=read.table("trialmergedexcludedrs3926405,rs365066AfricaKhoisanexcludedsnpwith0phenotypegeno0.05hwe0.001Batwa_Kiga.913651pos.230samples.PNAS2014_trial1_flip.5.Q")
> barplot(t(as.matrix(tbl)), col=rainbow(5),
+ xlab="Individual", ylab="Ancestry", border=NA)
``````

Hi, so the thing that I try to do with plink, admixture, and R is to find the ancestry(ies) of some populations in my genomic data.

I just got the proportions for each individual with ADMIXTURE and I ploted these files in R using K=2 to K=5. After that, the plot that I got are with different colors depending on K value, so the first one is with 2 colors and the last one is with 5 colors. the question is now, how can I Know the meaning of each color in each individual proportion?

Thank you so much.

Dear Dirrank,

did you find the answer to your question?, because I'm in the same situation.

Thank you.

Dear edison.vazquez, you can try to use a new R packege called BITE. We have implemented 2 different functions to plot Admixture results.

Hello, did you find something?

Thank you.

2
Zev.Kronenberg11k wrote:

I just finished writing an set of functions for admixture plotting:

after sourcing the file point the R function at the directory containing the *.Q and *.fam

``````plots<-plot.admixture("/Users/zev/Documents/projects/human_diversity/admixture/")
``````

to access subplot 5

``````plots\$`5`
``````

@Zev.Kronenberg

I tried using the function but ran into following error (the last line). Is it something that I'm doing wrong?

``````plots<-plot.admixture("/my/directory/admixture_linux-1.3.0/")
Find out what's changed in ggplot2 at
Error in file(file, "rt") : invalid 'description' argument
``````

looks like the function is having trouble finding the files.  What are the file extensions?

Standard output files of admixture plus a .fam file (Folder contains File.1.Q to File.10.Q & File.fam). Along with that the folder also has the results of admixture analysis of a different dataset. Could that be a problem? Is there a way to specify a particular inputfile?

Right now no. It should would if you separate the runs into different folders.  Feel free to change the [R] code to specify an input.

Thanks Zev!

I moved the results to a different directory and now i get this error: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels, : duplicated levels in factors are deprecated How to resolve this?

Can you send me a test dataset?

@Zev,here is the test data

@Zev.Kronenberg The code worked once and but gives an error when I ran it the second time. I do not understand why.I tried plotting the same files, it ran fine once though

``````Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 0, 190
Error: with piece 1:
``````

Hi Zev,

Thank you for the code. I was busy to understand learning coding other stuff. But now when I try to run your code in R, even just at the beginning I got the error below. And if I remove the quote when specifying the directory, the rest of the code becomes comment.

``````plot.admixture<-function("~/Documents/admixture_macosx-1.23")
``````

Do you have an idea why this happen?

Dear Zev, Thank you. I managed to get the admixture plot using your script. However, the x-axis has the individual samples names. Could it be possible to show the population group as well? the fam file looks the below way where the las column says the group.

```````AC065 AC065 0 0 0 -9 G1
AM236B1 AM236B1 0 0 0 -9 G1
BB011 BB011 0 0 0 -9 G2
BC1021 BC1021 0 0 0 -9 G3
BC1026 BC1026 0 0 0 -9 G3`
``````

The current script has the sample names from the second column on X-axis. Could it also include the group names to visualize both the sample names and the group to which it belongs to?

Hi Zev,

Thank you for your code. The function plot.admixture is unable to save as a function in the R environment after running the loop.

``````plot.admixture<-function("/Users/grubent/Documents/Admixture/"){
``````

Therefore when I attempt to plot the results or save the results as you describe here:

``````results <- plot.admixture("directory/")
``````

the function plot.admixture is not saved in my environment. This partly may be to the fact I modified the code here:

``````factor(datframe\$Name, levels = datframe\$Name, ordered = TRUE) --> factor(datframe\$Name, levels = unique(datframe\$Name), ordered = TRUE)
``````

Because I was getting this error in the loop

``````Error in `levels<-`(`*tmp*`, value = as.character(levels)) : factor level  is duplicated | Called from: factor(datframe\$Name, levels = datframe\$Name, ordered = TRUE)
``````

Would you be able to help or provide guidance as to what the issue is?