Question: Method to get all possible combinations of genotypes for a group of SNPs
0
gravatar for Volka
2.2 years ago by
Volka120
Volka120 wrote:

Hi all,

I have with me now a group of about 20 SNPs that I would like to acquire all possible genotype combinations for. As an example, let's start off with three SNPs and their alleles.

SNP      A1         A2
SNP1      A          T
SNP2      C          G
SNP3      T          A

I want to start off by generating a list of all possible genotype permutations/combinations of these three SNPs, for example:

SNP1 SNP2 SNP3
  AA   CC   TT
  AA   CC   TA
  AA   CC   AA
  AA   CG   TT
  AA   CG   TA
  AA   CG   AA
  AA   GG   TT
  AA   GG   TA
  AA   GG   AA
  ...

And so on, for what I expect to be 3^3 = 27 possible combinations.

From here, I hope to scale this up to my full group of ~20 SNPs. What is a good way of doing this, in Python or even in R?

ADD COMMENTlink modified 2.2 years ago by thomaskuilman800 • written 2.2 years ago by Volka120
2
gravatar for thomaskuilman
2.2 years ago by
thomaskuilman800
thomaskuilman800 wrote:

Here is a R-based solution using expand.grid():

> SNP1
[1] "A" "T"
> SNP2
[1] "C" "G"
> SNP3
[1] "T" "A"
> expand.grid(SNP1_alleleA = SNP1, SNP1_alleleB = SNP1, SNP2_alleleA = SNP2, SNP2_alleleB = SNP2,
              SNP3_alleleA = SNP3, SNP3_alleleB = SNP3)
   SNP1_alleleA SNP1_alleleB SNP2_alleleA SNP2_alleleB SNP3_alleleA SNP3_alleleB
1             A            A            C            C            T            T
2             T            A            C            C            T            T
3             A            T            C            C            T            T
4             T            T            C            C            T            T
...
61            A            A            G            G            A            A
62            T            A            G            G            A            A
63            A            T            G            G            A            A
64            T            T            G            G            A            A

SNP1, SNP2 and SNP3 are character vectors of the possible polymorphisms for that particular SNP.

If you need to scale this up, it might be handy to use something along the lines of

> SNPs
[1] "SNP1" "SNP2" "SNP3"
> eval(parse(text = paste0("expand.grid(",
                           paste0(rep(SNPs, each = 2), c("_alleleA", "_alleleB"),
                                  " = ", rep(SNPs, each = 2), collapse = ", "),
                           ")")))

which gives you the exact same output, and where you can vary the amount of SNPs you include in your analysis by use of the SNPs variable. FYI, eval(parse(text = SOME_CHARACTER_STRING)) parses and evaluates the expression denoted by SOME_CHARACTER_STRING.

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by thomaskuilman800

I have marked this as accepted answer, it looks like OP is already using this script. @Volka please attend to previous posts, upvote, and accept the answer or leave a comment. Opening a chain of new posts for the same problem is discouraged.

ADD REPLYlink written 2.0 years ago by Michael Dondrup48k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1225 users visited in the last hour