Annotating dbnsfp by SnpSift on a cluster
3
0
Entering edit mode
5.1 years ago
kanika.151 ▴ 160

Hello All, I know that the command works as it ran for smaller chromosomes such as '21' or 'Y' on the cluster. Nodes: 3 and threads 24. Should I increase the no of. nodes or threads? Because that's the maximum memory (-Xmx64g) I can give. Or should I do something else as the other chromosomes are not getting processed. dbnsfp version 4.0a

SnpSift -Xmx64g dbnsfp -f ref,alt,aaref,aaalt,rs_dbSNP151,aapos,genename,Ensembl_geneid,Ensembl_transcriptid,Ensembl_proteinid,Uniprot_acc,Uniprot_entry,HGVSc_ANNOVAR,HGVSp_ANNOVAR,HGVSc_snpEff,HGVSp_snpEff,HGVSc_VEP,HGVSp_VEP,GENCODE_basic,VEP_canonical,cds_strand,refcodon,codonpos,codon_degeneracy,Ancestral_allele,AltaiNeandertal,Denisova,VindijiaNeandertal,SIFT_score,SIFT_converted_rankscore,SIFT_pred,SIFT4G_score,SIFT4G_converted_rankscore,SIFT4G_pred,Polyphen2_HDIV_score,Polyphen2_HDIV_rankscore,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_rankscore,Polyphen2_HVAR_pred,LRT_score,LRT_converted_rankscore,LRT_pred,LRT_Omega,MutationTaster_score,MutationTaster_converted_rankscore,MutationTaster_pred,MutationTaster_model,MutationTaster_AAE,MutationAssessor_score,MutationAssessor_rankscore,MutationAssessor_pred,PROVEAN_score,PROVEAN_converted_rankscore,PROVEAN_pred,VEST4_score,VEST4_rankscore,MetaSVM_score,MetaSVM_rankscore,MetaSVM_pred,MetaLR_score,MetaLR_rankscore,MetaLR_pred,MutPred_score,MutPred_rankscore,MutPred_protID,MutPred_AAchange,MutPred_Top5features,Aloft_Fraction_transcripts_affected,Aloft_prob_Tolerant,Aloft_prob_Recessive,Aloft_prob_Dominant,Aloft_pred,Aloft_Confidence,CADD_raw,CADD_raw_rankscore,CADD_phred,GERP++_NR,GERP++_RS,GERP++_RS_rankscore -db dbNSFP4.0a_{chr}_hg19.txt.gz input.{chr}.vcf > output.{chr}.vcf
next-gen dna-seq annotation dbnsfp snpsift • 1.9k views
ADD COMMENT
0
Entering edit mode

What error are you getting? What scheduler/workload manager are you submitting to?

ADD REPLY
0
Entering edit mode

Error: java.lang.outofmemoryerror: java heap space.... Scheduler: PBS

ADD REPLY
1
Entering edit mode
5.1 years ago
Brice Sarver ★ 3.8k

If your only issue is that you're running out of memory, the maximum memory has been reserved per node, and your -l mem=SIZE is set large enough by default, there's not a whole lot you can do. Instead, try using something like split to split your VCFs into more manageable files and combine them back into a single VCF at the end.

ADD COMMENT
0
Entering edit mode

I have already set it to the maximum memory size per node. And, the VCF files are split by chromosomes already. Do you mean I should split them further? :|

ADD REPLY
1
Entering edit mode

Yes - if your data are from gnomAD, as you mentioned below, you're dealing with a ton of sites per chromosome for just the WES dataset and way more for the WGS dataset. I would test one or two partitions of, say, chromosome 1 and see what the memory footprint is. It may be helpful to bring up an interactive session on your cluster, if your configuration and administrator allows it.

ADD REPLY
0
Entering edit mode
5.1 years ago
kanika.151 ▴ 160

Done!

Forgot to index each file as Pablo said. tabix -s 1 -b 2 -e 2 "$file"

Thanks.

ADD COMMENT

Login before adding your answer.

Traffic: 1895 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6