Question

For Loop to make new dataframe

0

Entering edit mode

3.7 years ago

selplat21 ▴ 20

I have a dataframe with pedigree information like so:

FamilyID MomID DadIDv Nummales_Env1 Numfemales_Env1 Nummales_Env2 Numfemales_Env2

I want to generate a new data frame that now has the information for each individual like so:

FID-Num MomID DadID Sex Env

For example if the following is my original data frame with some example values for each column:

FamilyID MomID DadIDv Nummales_Env1 Numfemales_Env1 Nummales_Env2 Numfemales_Env2

AB, A, B, 2, 3, 4, 2

I would then have 11 rows in my new dataframe

FID-Num MomID DadID Sex Env

AB-1, A, B, M, 1
AB-2, A, B, F, 1
AB-3, A, B, M, 2

...

Is there a for loop I can write for this?

R Excel • 1.4k views

ADD COMMENT • link updated 3.7 years ago by rpolicastro 13k • written 3.7 years ago by selplat21 ▴ 20

0

Entering edit mode

It might make it easier to answer the question if you had a few example lines from the data.frame you are starting with, and an example of what you want the output to look like.

ADD REPLY • link 3.7 years ago by rpolicastro 13k

0

Entering edit mode

Good suggestion! The top is what I have and the bottom is an example of what I'm looking for

Family ID Env1_Females Env1_Males Env2_Females Env2_Males P1 P2
AKCO    2   4   8   2   AK4 CO1
AKDC    3   3   3   7   AK6 DC2
ALZC    0   0   5   4   AL2 CZ4
ARCT    1   5   4   6   AR3 CT4
COCZ    1   2   1   2   CO5 CZ2

Example for just the first row from first data frame into second dataframe.

ID  P1  P2  Sex Env
AKCO-1  AK4 CO1 F   1
AKCO-2  AK4 CO1 F   1
AKCO-3  AK4 CO1 F   2
AKCO-4  AK4 CO1 F   2
AKCO-5  AK4 CO1 F   2
AKCO-6  AK4 CO1 F   2
AKCO-7  AK4 CO1 F   2
AKCO-8  AK4 CO1 F   2
AKCO-9  AK4 CO1 F   2
AKCO-10 AK4 CO1 F   2
AKCO-11 AK4 CO1 M   1
AKCO-12 AK4 CO1 M   1
AKCO-13 AK4 CO1 M   1
AKCO-14 AK4 CO1 M   1
AKCO-15 AK4 CO1 M   2
AKCO-16 AK4 CO1 M   2

ADD REPLY • link 3.7 years ago by selplat21 ▴ 20

score 3 · Answer 1 · 2020-07-31

3

Entering edit mode

3.7 years ago

rpolicastro 13k

Here's a tidyverse solution.

Example data.

df <- structure(list(Family = c("AKCO", "AKDC", "ALZC", "ARCT", "COCZ"), 
    Env1_Females = c(2L, 3L, 0L, 1L, 1L), Env1_Males = c(4L, 
    3L, 0L, 5L, 2L), Env2_Females = c(8L, 3L, 5L, 4L, 1L), Env2_Males = c(2L, 
    7L, 4L, 6L, 2L), P1 = c("AK4", "AK6", "AL2", "AR3", "CO5"
    ), P2 = c("CO1", "DC2", "CZ4", "CT4", "CZ2")), class = "data.frame", row.names = c(NA, 
-5L))

Here's the code to generate the output.

library("tidyverse")

df <- df %>%
  pivot_longer(starts_with("Env"), names_sep="_", values_to="count", names_to=c("Env", "Sex")) %>%
  mutate(Sex=str_sub(Sex, end=1), Env=str_extract(Env, "\\d+$")) %>%
  uncount(count) %>%
  group_by(Family) %>%
  mutate(ID=str_c(Family, "-", seq_len(n()))) %>%
  ungroup %>%
  select(-Family)

And the desired output.

>df
# A tibble: 63 x 5
   P1    P2    Env   Sex   ID     
   <chr> <chr> <chr> <chr> <chr>  
 1 AK4   CO1   1     F     AKCO-1 
 2 AK4   CO1   1     F     AKCO-2 
 3 AK4   CO1   1     M     AKCO-3 
 4 AK4   CO1   1     M     AKCO-4 
 5 AK4   CO1   1     M     AKCO-5 
 6 AK4   CO1   1     M     AKCO-6 
 7 AK4   CO1   2     F     AKCO-7 
 8 AK4   CO1   2     F     AKCO-8 
 9 AK4   CO1   2     F     AKCO-9 
10 AK4   CO1   2     F     AKCO-10
# ... with 53 more rows

ADD COMMENT • link 3.7 years ago by rpolicastro 13k

1

Entering edit mode

Good solution, one can combine

pivot_longer(starts_with("Env"), names_to="Env_Sex", values_to="count") %>%
  separate("Env_Sex", into=c("Env", "Sex"), sep="_") %>%

to

pivot_longer(starts_with("Env"), names_sep="_",values_to="count", names_to=c("Env","sex"))

ADD REPLY • link 3.7 years ago by cpad0112 21k

1

Entering edit mode

Thanks for the tip! I edited the original answer to incorporate this.

ADD REPLY • link 3.7 years ago by rpolicastro 13k

0

Entering edit mode

This is really helpful, you are really saving me hours manual input!! It's also helping me understand the structure of tidyverse commands.

Okay, one more question. I have modified the above script to add Adult/Juvenile Information.

For example, here is the head of the original data:

Family  Env1_females    Env1_males  Env1_Juvenile_females   Env1_Juvenile_males Env1_Adult_females  Env1_Adult_males    Env2_females    Env2_males  Env2_Juvenile_females   Env2_Juvenile_males Env2_Adult_females  Env2_Adult_males    F1_mother   F1_father
AKCO    2   4   1   2   1   2   8   2   4   1   4   1   AK4 CO1
AKDC    3   3   1   1   2   2   3   7   1   3   2   4   AK6 DC2
ALZC    0   0   0   0   0   0   5   4   2   2   3   2   AL2 CZ4
ARCT    1   5   0   2   1   3   4   6   2   3   2   3   AR3 CT4
COCZ    1   2   0   1   1   1   1   2   0   1   1   1   CO5 CZ2

And I am trying to get it to look like this:

# A tibble: 63 x 5
   P1    P2    Env   Sex   Age ID     
   <chr> <chr> <chr> <chr> <chr><chr>  
 1 AK4   CO1   1     F   Adult  AKCO-1 
 2 AK4   CO1   1     F   Adult  AKCO-2 
 3 AK4   CO1   1     M  Juvenile  AKCO-3 
 ...

I guess in this case the tidyverse commands will only need to use the Adult or Juvenile Columns for each sex rather than the total for each sex. This just means that within each sex, there is another layer of adult or juvenile. The following is what I have so far.

df <- structure(list(Family = Data$Family, 
  Env1_Juv_F = Data$Juv_CF,
  Env1_Juv_M = Data$Juv_CM,
  Env2_Juv_F = Data$Juv_IF,
  Env2_Juv_M = Data$Juv_IM,
  Env1_Ad_F = Data$Adult_CF,
  Env1_Ad_M = Data$Adult_CM,
  Env2_Ad_F = Data$Adult_IF,
  Env2_Ad_M =  Data$Adult_IM, 
  P1 = Data$F1_mother, 
  P2 = Data$F1_father), class = "data.frame", row.names = c(NA, -56L))

  df <- df %>%
  pivot_longer(starts_with("Env"), names_sep="_", values_to="count", names_to=c("Env", "Age", "Sex")) %>%
  mutate(Sex=str_sub(Sex, end=1), Age=str_sub(Age, end=3), Env=str_extract(Env, "\\d+$")) %>%
  uncount(count) %>%
  group_by(Family) %>%
  mutate(ID=str_c(Family, "-", seq_len(n()))) %>%
  ungroup %>%
  select(-Family)

EDIT: OMG IT WORKS, THANK YOU FOR YOUR HELP.

ADD REPLY • link 3.7 years ago by selplat21 ▴ 20

0

Entering edit mode

Glad it worked! The tidyverse definitly makes data analysis much easier.

ADD REPLY • link 3.7 years ago by rpolicastro 13k