I am interested in creating an averaged value for genotype across two parents:
E.g. Input (three unrelated couples, 1, 2 and 3):
Individual SNP1 SNP2 SNP3 SNP4 ...
Father_1 1 0 1 2 ...
Mother_1 0 0 2 1 ...
Father_2 1 0 1 1 ...
Mother_2 2 1 0 1 ...
Father_3 1 2 0 1 ...
Mother_3 1 . 0 2 ...
E.g. Output (the midparental genotypes for couple 1, 2 and 3):
Individual SNP1 SNP2 SNP3 SNP4 ...
Midparent_1 0.5 0 1.5 1.5 ...
Midparent_2 1.5 0.5 0.5 1 ...
Midparent_3 1 . 0 1.5 ...
I can think of a way to do this using e.g. a PLINK .raw file and manipulating that in R, but am concerned that would become memory- and storage-heavy when assessing 100000s of SNPs and 1000s of pairs.
Does anyone know of an existing tool that can do this?
Thanks
Can you provide more example rows, add more parent pairs. Pretty sure it is easily done using R. Something like split by ped
lapply
, then usecolMeans
.Thanks - have added. My concern is less the method to do so in R (if we end up doing that, I'll add the code as an answer), but more whether there's a less computationally-intensive / more sophisticated method that I'm missing.
Do we assume rows 1,2 is one parent pair and rows 3,4 is another parent pair, etc?
That can be assumed (in that it would be straightforward to ensure the file is formatted in that way)
Hello,
you've also posted this question on stackexchange, but with slightly (but important) differences in your in- and output. You should tell us when you cross-post your questions.
Thanks.
fin swimmer