Question: For Loop to make new dataframe
0
gravatar for selplat21
11 days ago by
selplat2110
selplat2110 wrote:

I have a dataframe with pedigree information like so:

FamilyID MomID DadIDv Nummales_Env1 Numfemales_Env1 Nummales_Env2 Numfemales_Env2

I want to generate a new data frame that now has the information for each individual like so:

FID-Num MomID DadID Sex Env

For example if the following is my original data frame with some example values for each column:

FamilyID MomID DadIDv Nummales_Env1 Numfemales_Env1 Nummales_Env2 Numfemales_Env2

AB, A, B, 2, 3, 4, 2

I would then have 11 rows in my new dataframe

FID-Num MomID DadID Sex Env

AB-1, A, B, M, 1
AB-2, A, B, F, 1
AB-3, A, B, M, 2

...

Is there a for loop I can write for this?

excel R • 145 views
ADD COMMENTlink modified 11 days ago by rpolicastro720 • written 11 days ago by selplat2110

It might make it easier to answer the question if you had a few example lines from the data.frame you are starting with, and an example of what you want the output to look like.

ADD REPLYlink written 11 days ago by rpolicastro720

Good suggestion! The top is what I have and the bottom is an example of what I'm looking for

Family ID Env1_Females Env1_Males Env2_Females Env2_Males P1 P2
AKCO    2   4   8   2   AK4 CO1
AKDC    3   3   3   7   AK6 DC2
ALZC    0   0   5   4   AL2 CZ4
ARCT    1   5   4   6   AR3 CT4
COCZ    1   2   1   2   CO5 CZ2

Example for just the first row from first data frame into second dataframe.

ID  P1  P2  Sex Env
AKCO-1  AK4 CO1 F   1
AKCO-2  AK4 CO1 F   1
AKCO-3  AK4 CO1 F   2
AKCO-4  AK4 CO1 F   2
AKCO-5  AK4 CO1 F   2
AKCO-6  AK4 CO1 F   2
AKCO-7  AK4 CO1 F   2
AKCO-8  AK4 CO1 F   2
AKCO-9  AK4 CO1 F   2
AKCO-10 AK4 CO1 F   2
AKCO-11 AK4 CO1 M   1
AKCO-12 AK4 CO1 M   1
AKCO-13 AK4 CO1 M   1
AKCO-14 AK4 CO1 M   1
AKCO-15 AK4 CO1 M   2
AKCO-16 AK4 CO1 M   2
ADD REPLYlink modified 11 days ago • written 11 days ago by selplat2110
3
gravatar for rpolicastro
11 days ago by
rpolicastro720
rpolicastro720 wrote:

Here's a tidyverse solution.

Example data.

df <- structure(list(Family = c("AKCO", "AKDC", "ALZC", "ARCT", "COCZ"), 
    Env1_Females = c(2L, 3L, 0L, 1L, 1L), Env1_Males = c(4L, 
    3L, 0L, 5L, 2L), Env2_Females = c(8L, 3L, 5L, 4L, 1L), Env2_Males = c(2L, 
    7L, 4L, 6L, 2L), P1 = c("AK4", "AK6", "AL2", "AR3", "CO5"
    ), P2 = c("CO1", "DC2", "CZ4", "CT4", "CZ2")), class = "data.frame", row.names = c(NA, 
-5L))

Here's the code to generate the output.

library("tidyverse")

df <- df %>%
  pivot_longer(starts_with("Env"), names_sep="_", values_to="count", names_to=c("Env", "Sex")) %>%
  mutate(Sex=str_sub(Sex, end=1), Env=str_extract(Env, "\\d+$")) %>%
  uncount(count) %>%
  group_by(Family) %>%
  mutate(ID=str_c(Family, "-", seq_len(n()))) %>%
  ungroup %>%
  select(-Family)

And the desired output.

>df
# A tibble: 63 x 5
   P1    P2    Env   Sex   ID     
   <chr> <chr> <chr> <chr> <chr>  
 1 AK4   CO1   1     F     AKCO-1 
 2 AK4   CO1   1     F     AKCO-2 
 3 AK4   CO1   1     M     AKCO-3 
 4 AK4   CO1   1     M     AKCO-4 
 5 AK4   CO1   1     M     AKCO-5 
 6 AK4   CO1   1     M     AKCO-6 
 7 AK4   CO1   2     F     AKCO-7 
 8 AK4   CO1   2     F     AKCO-8 
 9 AK4   CO1   2     F     AKCO-9 
10 AK4   CO1   2     F     AKCO-10
# ... with 53 more rows
ADD COMMENTlink modified 11 days ago • written 11 days ago by rpolicastro720
1

Good solution, one can combine

pivot_longer(starts_with("Env"), names_to="Env_Sex", values_to="count") %>%
  separate("Env_Sex", into=c("Env", "Sex"), sep="_") %>%

to

pivot_longer(starts_with("Env"), names_sep="_",values_to="count", names_to=c("Env","sex"))
ADD REPLYlink modified 11 days ago • written 11 days ago by cpad011213k
1

Thanks for the tip! I edited the original answer to incorporate this.

ADD REPLYlink written 11 days ago by rpolicastro720

This is really helpful, you are really saving me hours manual input!! It's also helping me understand the structure of tidyverse commands.

Okay, one more question. I have modified the above script to add Adult/Juvenile Information.

For example, here is the head of the original data:

Family  Env1_females    Env1_males  Env1_Juvenile_females   Env1_Juvenile_males Env1_Adult_females  Env1_Adult_males    Env2_females    Env2_males  Env2_Juvenile_females   Env2_Juvenile_males Env2_Adult_females  Env2_Adult_males    F1_mother   F1_father
AKCO    2   4   1   2   1   2   8   2   4   1   4   1   AK4 CO1
AKDC    3   3   1   1   2   2   3   7   1   3   2   4   AK6 DC2
ALZC    0   0   0   0   0   0   5   4   2   2   3   2   AL2 CZ4
ARCT    1   5   0   2   1   3   4   6   2   3   2   3   AR3 CT4
COCZ    1   2   0   1   1   1   1   2   0   1   1   1   CO5 CZ2

And I am trying to get it to look like this:

# A tibble: 63 x 5
   P1    P2    Env   Sex   Age ID     
   <chr> <chr> <chr> <chr> <chr><chr>  
 1 AK4   CO1   1     F   Adult  AKCO-1 
 2 AK4   CO1   1     F   Adult  AKCO-2 
 3 AK4   CO1   1     M  Juvenile  AKCO-3 
 ...

I guess in this case the tidyverse commands will only need to use the Adult or Juvenile Columns for each sex rather than the total for each sex. This just means that within each sex, there is another layer of adult or juvenile. The following is what I have so far.

df <- structure(list(Family = Data$Family, 
  Env1_Juv_F = Data$Juv_CF,
  Env1_Juv_M = Data$Juv_CM,
  Env2_Juv_F = Data$Juv_IF,
  Env2_Juv_M = Data$Juv_IM,
  Env1_Ad_F = Data$Adult_CF,
  Env1_Ad_M = Data$Adult_CM,
  Env2_Ad_F = Data$Adult_IF,
  Env2_Ad_M =  Data$Adult_IM, 
  P1 = Data$F1_mother, 
  P2 = Data$F1_father), class = "data.frame", row.names = c(NA, -56L))

  df <- df %>%
  pivot_longer(starts_with("Env"), names_sep="_", values_to="count", names_to=c("Env", "Age", "Sex")) %>%
  mutate(Sex=str_sub(Sex, end=1), Age=str_sub(Age, end=3), Env=str_extract(Env, "\\d+$")) %>%
  uncount(count) %>%
  group_by(Family) %>%
  mutate(ID=str_c(Family, "-", seq_len(n()))) %>%
  ungroup %>%
  select(-Family)

EDIT: OMG IT WORKS, THANK YOU FOR YOUR HELP.

ADD REPLYlink modified 9 days ago • written 9 days ago by selplat2110

Glad it worked! The tidyverse definitly makes data analysis much easier.

ADD REPLYlink written 9 days ago by rpolicastro720
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1552 users visited in the last hour