how to extract gene name for LOF (loss of function) variants
2
0
Entering edit mode
3.2 years ago
reza ▴ 270

hi everyone

i annotated my VCF file using snpEff, now i want gene names for LOF (loss of function) variants. how can i do it? anyone can help me? for example in following variant, name of TRAPPC8 gene must be extracted.

KN271071.1  10335196    .   G   A   217.0   PASS    DP=35;VDB=0.8177;SGB=-0.693136;MQSB=0.206851;MQ0F=0;AC=2;AN=2;DP4=0,0,13,22;MQ=49;SF=0,1,2;ANN=A|stop_gained|HIGH|TRAPPC8|gene1363|transcript|rna1581|pseudogene|7/29|n.943C>T|p.Arg315*|943/5052|943/-1|315/-1||WARNING_TRANSCRIPT_MULTIPLE_STOP_CODONS&WARNING_REF_DOES_NOT_MATCH_GENOME,A|stop_gained|HIGH|TRAPPC8|gene1363|transcript|rna1580|pseudogene|7/29|n.940C>T|p.Arg314*|940/5049|940/-1|314/-1||WARNING_TRANSCRIPT_MULTIPLE_STOP_CODONS&WARNING_REF_DOES_NOT_MATCH_GENOME;LOF=(TRAPPC8|gene1363|2|1.00);NMD=(TRAPPC8|gene1363|2|1.00)

snpEff LOF variants • 1.4k views
0
Entering edit mode

Hello reza!

We believe that this post does not fit the main topic of this site.

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

0
Entering edit mode

@Ram parsing SNPEFF is not so easy. Especially because there is more than one prediction per variant.

0
Entering edit mode

Apologies, reza and Pierre Lindenbaum - I misunderstood the question. Reopening it now.

2
Entering edit mode
3.2 years ago

i guess you can use snpsift for this. Appropriate command is (modified from snpsift manual page):

java -jar SnpSift.jar filter "(exists LOF[*].PERC)" input.snpeff.vcf


snpsift manual here. As each LOF has certain % calculated, all LOF entries will have LOF perc as well. This can be further shortened as

java -jar SnpSift.jar filter "(exists LOF)" input.snpeff.vcf


with sed you can try, if you just want to filter by LOF. However, if you want to filter by LOF percentage, then better use snpsift:

sed -ne '/#/p;/LOF/p' input.snpeff.vcf

0
Entering edit mode

thanks, your suggested way worked fine.

0
Entering edit mode

I've moved cpad's comment to an answer - you can now accept it to show that it works. Thank you!

3
Entering edit mode
3.2 years ago

using bioalcidaejdk:http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html

java -jar dist/bioalcidaejdk.jar -e 'stream().flatMap(V->tools.getAnnPredictions(V).stream().map(P->V.getContig()+" "+V.getStart()+" "+V.getAlleles()+" "+P.getGeneName())).forEach(S->println(S));' in.vcf

• stream(). get a stream of variants
• flatMap(V->tools.getAnnPredictions(V).stream() use an internal SNPEFF parser to convert to a stream of prediction
• .map(P->V.getContig()+" "+V.getStart()+" "+V.getAlleles()+" "+P.getGeneName())). convert to CHROM/POS/ALLELES/GENE
• forEach(S->println(S)); print things.
0
Entering edit mode

thanks Pierre, your program worked but i want variants that have LOF tag (in my file 466 variants from 994779 variants). your program give me a file with 1526324 row!

1
Entering edit mode

You should mention this when you create the question, reza. Your question says you need gene names that have LOF variants, not variant records that are annotated as LOF causing.