Question

Can't figure out order of output rows in ADMIXTURE

2

Entering edit mode

7.4 years ago

jenn.drummond ▴ 80

Hi, all. I'm trying to do basic population structure analysis with ADMIXTURE because it's faster than STRUCTURE, but I can't figure out how to get the populations to cluster together. More generally, I can't figure out for sure what order my outputs are in, within the P and Q files. There's a similar unanswered question from about a year ago.

The only references to the output format in the otherwise helpful manual are these: "ADMIXTURE's...output is simple space-delimited files containing the parameter estimates. ... "There is an output file for each parameter set: Q (the ancestry fractions), and P (the allele frequencies of the inferred ancestral populations)." ... "[If you use bootstrapping] The "se" file is in the same unadorned file format as the point estimates."

Well, it's unadorned all right! I can't tell from the Q file which individuals have which fractions, and therefore I can't see whether they're grouping into the expected populations.

A natural assumption would be that the output is in the same order as the input file, but I'm not sure this is the case. I reversed the order of my input file, and very little changed for my outputs.

The example HapMap data in the ADMIXTURE documentation does order and group predictably. I can reproduce the plot on page 6 using the commands on page 5. If I use plink to convert the .bed to .ped, move the Yoruba individuals to the top of the file, reconvert to .bed, rerun ADMIXTURE, and re-plot (whew), the YRI block moves to the front of the figure, as I would expect.

But my own data doesn't behave like that. Here's a sample of the .ped file:

# Stacks v1.41;  PLINK v1.07; November 10, 2016
2  NKL3_001  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
2  NKL3_018  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
2  NKL3_028  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
2  NKL3_029  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
5  MTD_001   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
5  MTD_007   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
5  MTD_010   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
5  MTD_029   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
6  ORC_001   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
6  ORC_002   0  0  0  0  0  0  0  0  G  G  A  A  A  A  T  T  C  C  T  T
6  ORC_019   0  0  0  0  0  0  0  0  G  G  A  A  A  A  T  T  C  C  T  T
6  ORC_020   0  0  0  0  0  0  0  0  A  A  A  A  A  A  T  T  C  C  T  T

From the first column and from the sample names in the second column, you can see that there are three underlying/assumed populations. (Each population has about 30 individuals, but I edited for brevity.)

However, this is my output, at K=2 and K=3: original ordering Absolutely no clustering whatsoever.

Now, at this point, you may be saying "Well, maybe your individuals just aren't grouped into populations." Aside from the fact that we know they are, I ran a test. I reversed the order of the input file, expecting to see a mirror image of the first plot. But the plot stayed exactly the same: reversed order In case you're thinking maybe I glitched and used the same data or the same plot twice (it's okay, I think those things about myself too), the K=3 plot does show a few differences -- but not a reversal difference. Does anyone know what's going on here?

ADMIXTURE • 5.3k views

ADD COMMENT • link updated 7.3 years ago by beausoleilmo ▴ 580 • written 7.4 years ago by jenn.drummond ▴ 80

0

Entering edit mode

I have the same question and would like to know the order of the individuals in the output. For the rest you can order you file with this: tbl = read.table("~/Desktop/file_out_of_admixture.Q") and ord = tbl[order(tbl$V1,tbl$V2,tbl$V3),].

ADD REPLY • link 7.3 years ago by beausoleilmo ▴ 580

score 1 · Answer 1 · 2017-01-15

I found this website explaining how to do the full analysis. Basically, you have to look in the .fam file. You can import this file in R like this:

fam = read.table("~/path-to-the-file/my_analysis.fam")

After that I was reusing the first column and rename the rows of your .Q file:

tbl = read.table("~/path-to-the-file/my_analysis.Q")

Use this dataset to print your barplot!