Entering edit mode
3 months ago
Eliza ▴ 20
Hi, I downloaded the 21 genome data from the game and I send a job to my university cluster to read the vcf file. but I keep getting this error:
Scanning file to determine attributes. File attributes: meta lines: 598 header_line: 599 variant count: 3483000 column count: 8 Meta line 598 read in. All meta lines processed. gt matrix initialized. Character matrix gt created. Character matrix gt rows: 3483000 Character matrix gt cols: 8 skip: 0 nrows: 3483000 row_num: 0 Processed variant 1928000/var/spool/slurmd/job4788377/slurm_script: line 8: 87699 Killed Rscript /xxxx/xxxx/xxx/filter_snp_chr_21.R slurmstepd: error: Detected 1 oom-kill event(s) in StepId=4788377.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
this is my R script :
library("R.utils") library("vcfR") library("stringr") library("tidyverse") library("dplyr") vc=read.vcfR("gnomad.genomes.r2.1.1.sites.21.vcf") df=vc@fix data=as.data.frame(df) data_snp=data %>% filter(str_length(ALT)==1 & str_length(REF)==1)#filtering for SNPs write.csv(data_snp,"snp_genome_21.csv")
and this is the job:
#!/bin/bash #SBATCH --time=05:00:00 #SBATCH --ntasks=1 #SBATCH --mem=40G module load tensorflow/2.5.0 Rscript /xxxx/xxx/filter_snp_chr_21.R
Does it mean there is not enough memory? I gave the job 25G memory.
and what's the size of the uncompressed VCF ? :-D
@Pierre Lindenbaum 15G is the size as a .vcf file
Do you mean chromosome 21 from gnomAD?
Your script doesn't seem to handle the resources very well. Try doing your operation line by line instead of reading the whole file into the memory.
@ barslmn it worked for 10 G data with no problem