How to handle 30000 SNPs genotyping data of 500 individuals in R?
0
0
Entering edit mode
5.7 years ago

Dear All Good Morning

I have 30000 SNP genotyping data on 500 individuals and it is quite big data for me (approx. 80mb size) and i have only 4GB RAM and 1TG HDD. My final objective to diversity analysis and grouping these individuals based on their similarity. I tried to calculate genetic distance in R using POPPR package but i am getting memory insufficient error and i tried to convert my data to Genlight Object but no success. Are there any package in R to handle this size of data? Here i am requesting you to share your expertise and suggestion to solve this this. Any help in this regard is highly appreciated. This is my example data file :https://www.dropbox.com/s/phq1szmqlttsdja/test.txt?dl=0 Thanks in Advance Regards

R SNP • 1.5k views
ADD COMMENT
0
Entering edit mode

Yes, the resulting distance matrix would be enormous. What are the exact commands and error messages that you're using and receiving? There are ways of increasing memory in R but there is always going to be a limit somewhere along the line. Will pick this up in the morning (night here).

ADD REPLY
0
Entering edit mode

Dear kevin good morning thank you very much for your reply. I using Poppr package to calculate genetic distance. i am not getting any error from code but getting error like "can not allocate 24.5GB data". Thanking you

ADD REPLY
0
Entering edit mode

Then, it is trying to allocate 24.5GB of RAM. Most likely, your computer does not have that. Do you have access to a super-compute cluster?

ADD REPLY
0
Entering edit mode

Dear Kevin unfortunately no thanks for your reply

ADD REPLY
0
Entering edit mode

Please paste the exact command that you're using. Perhaps I can modify the code such that it could work.

ADD REPLY
0
Entering edit mode

Dear Kevin Good Afternoon sorry for my late reply and i am using following code to calculate GD please find my small code here https://www.dropbox.com/s/6wlftpd253fw2x3/code.txt?dl=0 and here is my example data https://www.dropbox.com/s/jw509iylda59xfv/example%20data.csv?dl=0 With this limitation i am even not able to do DAPC (descriminative analysis pf principle component) to identify groups, i mean how these individuals are grouped based on their relatedness Thanking you very much for your help Regards

ADD REPLY

Login before adding your answer.

Traffic: 2271 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6