Question: Annotating dbnsfp by SnpSift on a cluster
0
gravatar for kanika.151
4 weeks ago by
kanika.15180
Italy
kanika.15180 wrote:

Hello All, I know that the command works as it ran for smaller chromosomes such as '21' or 'Y' on the cluster. Nodes: 3 and threads 24. Should I increase the no of. nodes or threads? Because that's the maximum memory (-Xmx64g) I can give. Or should I do something else as the other chromosomes are not getting processed. dbnsfp version 4.0a

SnpSift -Xmx64g dbnsfp -f ref,alt,aaref,aaalt,rs_dbSNP151,aapos,genename,Ensembl_geneid,Ensembl_transcriptid,Ensembl_proteinid,Uniprot_acc,Uniprot_entry,HGVSc_ANNOVAR,HGVSp_ANNOVAR,HGVSc_snpEff,HGVSp_snpEff,HGVSc_VEP,HGVSp_VEP,GENCODE_basic,VEP_canonical,cds_strand,refcodon,codonpos,codon_degeneracy,Ancestral_allele,AltaiNeandertal,Denisova,VindijiaNeandertal,SIFT_score,SIFT_converted_rankscore,SIFT_pred,SIFT4G_score,SIFT4G_converted_rankscore,SIFT4G_pred,Polyphen2_HDIV_score,Polyphen2_HDIV_rankscore,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_rankscore,Polyphen2_HVAR_pred,LRT_score,LRT_converted_rankscore,LRT_pred,LRT_Omega,MutationTaster_score,MutationTaster_converted_rankscore,MutationTaster_pred,MutationTaster_model,MutationTaster_AAE,MutationAssessor_score,MutationAssessor_rankscore,MutationAssessor_pred,PROVEAN_score,PROVEAN_converted_rankscore,PROVEAN_pred,VEST4_score,VEST4_rankscore,MetaSVM_score,MetaSVM_rankscore,MetaSVM_pred,MetaLR_score,MetaLR_rankscore,MetaLR_pred,MutPred_score,MutPred_rankscore,MutPred_protID,MutPred_AAchange,MutPred_Top5features,Aloft_Fraction_transcripts_affected,Aloft_prob_Tolerant,Aloft_prob_Recessive,Aloft_prob_Dominant,Aloft_pred,Aloft_Confidence,CADD_raw,CADD_raw_rankscore,CADD_phred,GERP++_NR,GERP++_RS,GERP++_RS_rankscore -db dbNSFP4.0a_{chr}_hg19.txt.gz input.{chr}.vcf > output.{chr}.vcf
ADD COMMENTlink modified 6 days ago • written 4 weeks ago by kanika.15180

What error are you getting? What scheduler/workload manager are you submitting to?

ADD REPLYlink written 4 weeks ago by Brice Sarver3.2k

Error: java.lang.outofmemoryerror: java heap space.... Scheduler: PBS

ADD REPLYlink written 4 weeks ago by kanika.15180
1
gravatar for Brice Sarver
4 weeks ago by
Brice Sarver3.2k
United States
Brice Sarver3.2k wrote:

If your only issue is that you're running out of memory, the maximum memory has been reserved per node, and your -l mem=SIZE is set large enough by default, there's not a whole lot you can do. Instead, try using something like split to split your VCFs into more manageable files and combine them back into a single VCF at the end.

ADD COMMENTlink written 4 weeks ago by Brice Sarver3.2k

I have already set it to the maximum memory size per node. And, the VCF files are split by chromosomes already. Do you mean I should split them further? :|

ADD REPLYlink written 4 weeks ago by kanika.15180
1

Yes - if your data are from gnomAD, as you mentioned below, you're dealing with a ton of sites per chromosome for just the WES dataset and way more for the WGS dataset. I would test one or two partitions of, say, chromosome 1 and see what the memory footprint is. It may be helpful to bring up an interactive session on your cluster, if your configuration and administrator allows it.

ADD REPLYlink written 4 weeks ago by Brice Sarver3.2k
0
gravatar for kanika.151
6 days ago by
kanika.15180
Italy
kanika.15180 wrote:

Done!

Forgot to index each file as Pablo said. tabix -s 1 -b 2 -e 2 "$file"

Thanks.

ADD COMMENTlink written 6 days ago by kanika.15180
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1674 users visited in the last hour