Hello,
I am trying to filter text file for missing and redundant markers in R
. But R session gets aborted. I think this is because of the large file size because this code works on other small files. This file has SNP markers for 50 genotypes and is 25 Gb in size. So I want to do the same filtering for missing and redundant SNPs in the cluster and then import the file in R for further analysis. I am not sure how can I do it in the cluster.
Additionally, I think there is a problem in saving text file to RData file as well. I appreciate your time and effort for any help. Thank you!
Have you tried this using
fread
anddata.table
? I think it might work, becausedata.table
is a lot more performant with tables in the millions-of-rows range of records thandata.frame
ever can be. If I recall correctly,fread
even has streaming support so it won't need to store the entire file in memory.Yes, I imported the file using
data.table
andfread
. And then I used the above command to filter markers. But R session still gets aborted.Can you edit your question and add the exact code you're using please? You might also want to check stackoverflow on how to enhance the performance of
fread
.I have edited the question and I have added an additional problem in saving text file to RData file which I think could be the reason.
Side note: It's not good practice to use keywords as variable names, like you're doing with
na
.