Just to show you that there are always numerous ways to do the same stuff, here's a solution that feels a little more like typical R code to me -- nothing wrong with Christian's answer though, consider my example for educational purposes.
You will find that R table formats will differ from what you find useful in Excel. Generally speaking, tables will be longer, rather than wide.
So my first lines of codes mostly deal with formatting, you can easily do that in Excel if you prefer.
library(reshape2)
library(magrittr) # this is a useful package that introduces %>% for stringing commands together
# I saved your table above in a file called "test.txt"
test <- read.table("~/Downloads/test.txt", header = TRUE, stringsAsFactors = FALSE)
test
Ind M1 M2 M3 M4 M5
1 P1 A/A Unused G/A T/T T/T
2 P2 T/T A/A A/A A/A G/G
3 1 T/A A/A G/A T/T G/G
4 2 T/T A/A G/A T/T T/G
5 3 T/T A/A G/A T/T T/G
6 4 T/T A/A G/A A/T G/G
7 5 T/A A/A G/A A/T T/G
I'll make a separate table just for the parents.
parents <- as.data.frame(t(test[1:2,-1]))
names(parents) = c("P1","P2")
# replacing "unused" with NA because NA is a native identifier of missing data
parents$P1 <- gsub("Unused", NA, parents$P1)
parents$P2 <- gsub("Unused", NA, parents$P2)
# now, I'm adding separate columns for the individual alleles of P1 and P2
# gsub has the syntax: gsub(pattern to be replaced, replacement, string to operate on)
parents$P1.all1 <- gsub("\\/.","", parents$P1) # this replaces the / and the letter after it with nothing
parents$P1.all2 <- gsub(".\\/","", parents$P1) # this replaces the / and the letter BEFORE it with nothing
parents$P2.all1 <- gsub("\\/.","", parents$P2)
parents$P2.all2 <- gsub(".\\/","", parents$P2)
parents
P1 P2 P1.all1 P1.all2 P2.all1 P2.all2
M1 A/A T/T A A T T
M2 <NA> A/A <NA> <NA> A A
M3 G/A A/A G A A A
M4 T/T A/A T T A A
M5 T/T G/G T T G G
>
I also make a separate table for the offspring:
offspring <- test[-c(1:2),] # removing the lines corresponding to the parents
offspring <- reshape2::melt(offspring, id.vars = "Ind", variable.name = "type") # changing the format using function melt of the library reshape2
head(offspring)
Ind type value
1 1 M1 T/A
2 2 M1 T/T
3 3 M1 T/T
4 4 M1 T/T
5 5 M1 T/A
6 1 M2 A/A
# like for the parents, I add separate columns for allele 1 and 2
offspring$all.1 <- gsub("\\/.","", offspring$value)
offspring$all.2 <- gsub(".\\/","", offspring$value)
Now, let's get to work.
First, identify the individual types (M1 to M5) that are relevant based on your first criterion: P1 and P2 should be homozygous and P1 should not be the same as P2.
relevant_offspring <- subset(parents, ((P1.all1 == P1.all2) & (P2.all1 == P2.all2)) & P1 != P2 ) %>% rownames
# use those names to filter the offspring table
offspring <- subset(offspring, type %in% relevant_offspring)
Now, we need to combine the parent and offspring info again.
off_par <- merge(offspring, parents, by.x = "type", by.y = "row.names")
head(off_par)
type Ind value all.1 all.2 P1 P2 P1.all1 P1.all2 P2.all1 P2.all2
1 M1 1 T/A T A A/A T/T A A T T
2 M1 2 T/T T T A/A T/T A A T T
3 M1 3 T/T T T A/A T/T A A T T
4 M1 4 T/T T T A/A T/T A A T T
5 M1 5 T/A T A A/A T/T A A T T
6 M4 1 T/T T T T/T A/A T T A A
Let's add a new column with your code for match, no match, heterozygous.
I'm going to use the ifelse()
function that has the following syntax: ifelse( condition, what should happen if condition is TRUE, what should happen if condition is FALSE)
.
off_par$code <- with(off_par, ifelse(value == P2, "1", # homozygous match
ifelse( all.1 == all.2, "0", # if it's not a match, but still homozygous, we put 0
ifelse( (all.1 == P1.all1 | all.1 == P2.all1) & (all.2 == P1.all1 | all.2 == P2.all1) , "H", NA )
)
)
)
The last condition seems a bit complicated, but I just want to make sure we're really only setting an "H" if we make sure that we have a proper heterozygous. There should be an NA if the individual has a completely inexplicable genotype.
The result looks similar to what Christian had once we change the format again:
off_par[c("type", "Ind","code")] %>% reshape2::dcast(., Ind~type, value.var = "code")
Ind M1 M4 M5
1 1 H 0 1
2 2 1 0 H
3 3 1 0 H
4 4 1 H 1
5 5 H H H
Tell me if your first output file is what you like and then I can extend my answer.
Step 2:
C.
Dear Cristian Thanks lot for your help i got this error while executing first part
but i solved like this
Now i succefully got M1 i.e single column but i am getting this error while running 2nd part saying unexpected [ and { symbols. these are the error
once again thanks lot for your help Regards
I edited my answer in my first comment. Check it out and tell me where you got. There were mistakes in my first code. For example, when you fetch a column:
the i-th value of that column:
There are brackets missing in your blue code.
Why are columns M2 and M3 discarded? P1 and P2 are different for both.
Sorry, can you explain the problem more clearly? Do you want to output columns M2 and M3 in the first step? Also, I still haven't sorted out the heterozygote bit. I am on it.
How do you define 'between them' when the parents are not homozygous or the data not available, M3 and M2 respectively.
What does this mean: ' i want to check whether P1 and P2 match for M1 or not' ? Do you only want to export columns where both parents are homozygous?
Dear Cristian Thanks lot for your help in solving this problem if anyone of the parent either P1 or P2 are not homozygous (i.e. like A/T etc) or the data not available i do not want them in output and i will consider them as monomorphic that is the reason M2 and M3 are not in output and I will not consider in my further calculations.
is it possible to give H (heterozygous status) for segregating population for example individual 1 and 5 in M1 column? because i will count them and use this in next calculations.
i run your code and it is generating individual files to each marker like M1. csv, M2.csv etc is it possible to get them all in one file? and i am getting following error Warning messages: 1: In if (dfOutput[i, ] == dfOutput[2, ]) { : the condition has length > 1 and only the first element will be used like this i am getting 5 erors i think one per marker once again i would like to say big thanks for you help Regards
The code above is now exporting the 3 columns into a single file.
It's also marking the heterozygotes. I think table 2 is as you wish.