I am bit of a totally lost situation, I get one thing right, but then two things go wrong (sorry if some of these comments are repeats of stuff from previous threads). I need some help/advice on how to proceed.
Just for context, this dataset is meant for two things (for now), which are to 1) see how these samples related to various global populations using analyses such as PCA and STRUCTURE (or similar) and 2) estimate archaic introgression (i.e., Neanderthals and Denisovans)
I have 92 samples that were genotyped on the Affy Human Origins array from about 570,000 SNPs. I also have comparative data from 934 HGDP samples that I have downloaded, but there is just so much of it that I need to cull it down (ftp://ftp.cephb.fr/hgdp_supp10/Harvard_HGDP-CEPH/all_snp.map.gz ftp://ftp.cephb.fr/hgdp_supp10/Harvard_HGDP-CEPH/all_snp.ped.gz).
For the moment being, I've been trying to just PCA my own samples (n=92), but this isn't working either
I coded my genotypes to 0/1/2s using JMP Genomics and then did this using R on UF's computer cluster
read.table('92simpleb_recgeno.txt', sep='\t', header=TRUE, row.names=1)->table pcol=c(rep("green",3),rep("blue",89)) help(prcomp) table[ table == "." ] = NA t(table) -> trans pca<-prcomp(~ ., data=trans, na.action = na.omit) save(pca, file="1.RData")
and I got this back
Error: cannot allocate vector of size 1284.9 Gb Execution halted
Basically, I am lost (and a bit frustrated), and I need some suggestions on how to PCA this much data without metaphorically blowing something up. Currently, the samples are formatted with columns as samples and rows as loci, but I can re-export the data from Affy Genotyping Console in a different format. Any suggestions on what I should do.