filtering vcfs by at least lof variants
1
0
Entering edit mode
6 months ago
storm1907 ▴ 20

Hello,

I have vcf file with following INFO field for each variant. I would like to filter after MAF>30%, protein prediction, and also after LOF variants.

I tried

java -jar /mnt/home//tools/snpEff/snpEff.jar  -lof 

but is not clear, what argument should I put after -lof

chr1    935954  .       G       T       52.6    PASS    CSQ=|FAIL|0.00|0.00|0.00|0.00|14|23|43|47|||MODIFIER|SAMD11|ENSG00000187634|ENST00000342066|protein_coding||5/13|||||,|FAIL|0.00|0.00|0.00|0.00|14|23|43|47|||MODIFIER|SAMD11|ENSG00000187634|ENST00000616016|protein_coding||5/12|||||,|FAIL|0.00|0.00|0.00|0.00|14|23|43|47|||MODIFIER|SAMD11|ENSG00000187634|ENST00000616125|protein_coding||5/11|||||,|FAIL|0.00|0.00|0.00|0.00|14|23|43|47|||MODIFIER|SAMD11|ENSG00000187634|ENST00000617307|protein_coding||5/12|||||,|FAIL|0.00|0.00|0.00|0.00|14|23|43|47|||MODIFIER|SAMD11|ENSG00000187634|ENST00000618181|protein_coding||4/10|||||,|FAIL|0.00|0.00|0.00|0.00|14|23|43|47|||MODIFIER|SAMD11|ENSG00000187634|ENST00000618323|protein_coding||5/11|||||,|FAIL|0.00|0.00|0.00|0.00|14|23|43|47|||MODIFIER|SAMD11|ENSG00000187634|ENST00000618779|protein_coding||5/12|||||,|FAIL|0.00|0.00|0.00|0.00|14|23|43|47|||MODIFIER|SAMD11|ENSG00000187634|ENST00000620200|protein_coding||4/8|||||,|FAIL|0.00|0.00|0.00|0.00|14|23|43|47|||MODIFIER|SAMD11|ENSG00000187634|ENST00000622503|protein_coding||5/13|||||        GT:GQ:DP:AD:VAF:PL      0/1:53:22:9,13:0.590909:52,0,67 

Can somebody explain that?

Thank you!

lof • 367 views
ADD COMMENT
0
Entering edit mode

snpSift is the tool used for filtering, not snpEff https://pcingola.github.io/SnpEff/ss_introduction/

snpSift doesn't use the INFO/CSQ tag but the INFO/ANN tag. https://pcingola.github.io/SnpEff/se_inputoutput/#ann-field-vcf-output-files

ADD REPLY
1
Entering edit mode
6 months ago
atorreso ▴ 120

The -lof flag (without anything after it) will add the LOF tag to the INFO field in the VCF file. It should look something like this for the GRCm38.99 database (adjust this to your reference genome):

java -Xmx4g -jar snpEff.jar -lof GRCm38.99 yourFile.vcf.gz > yourFile.eff.vcf

After that you can filter using BCFtools or SnpSift depending on your needs. Examples here: how to extract gene name for LOF (loss of function) variants

ADD COMMENT
0
Entering edit mode
OK, I tried with hg38.fasta after -lof, but got 
java.lang.RuntimeException: Property: 'hg38.fasta.genome' not found
        at org.snpeff.interval.Genome.<init>(Genome.java:104)
        at org.snpeff.snpEffect.Config.readGenomeConfig(Config.java:693)
        at org.snpeff.snpEffect.Config.readConfig(Config.java:661)
        at org.snpeff.snpEffect.Config.init(Config.java:487)
        at org.snpeff.snpEffect.Config.<init>(Config.java:121)
        at org.snpeff.SnpEff.loadConfig(SnpEff.java:449)
        at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:939)
        at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:923)
        at org.snpeff.SnpEff.run(SnpEff.java:1188)
ADD REPLY
1
Entering edit mode

snpEff requires a database. I recommend reading their manual first: https://pcingola.github.io/SnpEff/se_introduction/

ADD REPLY
0
Entering edit mode

This is the output after java -jar /mnt/home/tools/snpEff/snpEff.jar -lof GRCh38.99 $file > $outfile

Does not look like any tag is added

 chrX    155491696       .       G       C       52.8    PASS    CSQ=|FAIL|0.00|0.03|0.00|0.00|42|-30|-30|-25|||MODIFIER|TMLHE|ENSG00000185973|ENST00000334398|protein_coding||7/7|||||,|FAIL|0.00|0.03|0.00|0.00|42|-30|-30|-25|||MODIFIER|TMLHE-AS1|ENSG00000224533|ENST00000433624|lncRNA||3/3|||||;ANN=C|downstream_gene_variant|MODIFIER|BX571846.1|ENSG00000225393|transcript|ENST00000447347.1|pseudogene||n.*4650G>C|||||4650|,C|downstream_gene_variant|MODIFIER|TMLHE|ENSG00000185973|transcript|ENST00000369439.4|protein_coding||c.*869C>G|||||839|,C|intron_variant|MODIFIER|TMLHE-AS1|ENSG00000224533|transcript|ENST00000433624.1|pseudogene|3/3|n.472-1184G>C||||||,C|intron_variant|MODIFIER|TMLHE|ENSG00000185973|transcript|ENST00000334398.8|protein_coding|7/7|c.1135-30C>G||||||,C|intron_variant|MODIFIER|TMLHE|ENSG00000185973|transcript|ENST00000449645.2|processed_transcript|1/1|n.209-30C>G||||||,C|intron_variant|MODIFIER|TMLHE-AS1|ENSG00000224533|transcript|ENST00000452506.1|pseudogene|1/1|n.67+2307G>C||||||    GT:GQ:DP:AD:VAF:PL      1/1:27:12:0,12:1:52,26,0 

Are there any other options except snpEff?

ADD REPLY

Login before adding your answer.

Traffic: 2258 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6