Problem with import of multiVCF by readData from popgenome R package
0
0
Entering edit mode
18 months ago

Hi all, I am trying to import a multiVCF file (created with GATK,~80 individuals (4.3Gb)) into R using the package "popgenome", and "readData". Unfortunately, the import always aborts with the error message: "R encountered a fatal error, session terminated". With a smaller data set it works fine. I also tried with compressed vcfs (bgzip) – did also not work for me. Do I miss something? Does my PC have not have enough computing resources? Has anyone had similar experiences or know how to solve this problem? About any advice I would be very grateful.

Kind regards

Pavlo

My code:

 gff3_out = c()
my_filter = c()
for(chr in chromosomes){
my_filter <- list(seqid=chr)
gff3_out <- file.path(gff_path, paste(chr,".gff",sep=""))
}
PopGenome::VCF_split_into_scaffolds("my_multiVCF_from_GATK.vcf","scaffoldVCFs2")
allgenomes <- PopGenome::readData("path/to/data/with_VCFs",format="VCF",gffpath = "path/to/data/gff_data",big.data = TRUE)


My PC: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz 1.99 GHz;

RAM 32,0 GB (31,9 GB verwendbar);

Systemtyp 64-Bit-Betriebssystem, x64-basierter Prozessor

R popgenome readData vcf multiVCF • 919 views
0
Entering edit mode
0
Entering edit mode

import a multiVCF file [...] (4.3Gb) into R

With a smaller data set it works fine

Does my PC have not have enough computing resources?

Most likely that. Try to check memory usage while loading the data, maybe that helps in diagnosis. Anecdotal, I recently crashed my computer with 32GB memory when attempting a plotting operation on a smaller file.

0
Entering edit mode

Hi Carambakaracho, thank you for your help.
Yes you right, it's probably related to the fact that my 32Gb RAM is not enough. It's funny because in the paper (Pfeifer et al., Mol. Biol. Evol. 31(7):1929-1936 doi:10.1093/molbev/msu136) where the package is described, it says that even large data sets can be loaded without problems on normal PCs (8GB RAM). I have now tried the same task on the Linux server and it works great. Who knows where exactly the problem is.

0
Entering edit mode

I have now checked under Windows memory usage while the record was being read by "readData" and gave no indication that a lack of RAM may be the cause of the crash.

0
Entering edit mode

Unfortunately, the crash message is pretty generic and points much towards a memory issue.

Another approach would hypothesize that one or some lines in your dataset might have an offending value. So you could split the set in half and see whether you get the error with one set or the other, then split the failing set again, and so forth.

Though I wouldn't expect the error message you got for some faulty data.