how to extract gene name for LOF (loss of function) variants
2
0
Entering edit mode
5.7 years ago
reza ▴ 300

hi everyone

i annotated my VCF file using snpEff, now i want gene names for LOF (loss of function) variants. how can i do it? anyone can help me? for example in following variant, name of TRAPPC8 gene must be extracted.

KN271071.1  10335196    .   G   A   217.0   PASS    DP=35;VDB=0.8177;SGB=-0.693136;MQSB=0.206851;MQ0F=0;AC=2;AN=2;DP4=0,0,13,22;MQ=49;SF=0,1,2;ANN=A|stop_gained|HIGH|TRAPPC8|gene1363|transcript|rna1581|pseudogene|7/29|n.943C>T|p.Arg315*|943/5052|943/-1|315/-1||WARNING_TRANSCRIPT_MULTIPLE_STOP_CODONS&WARNING_REF_DOES_NOT_MATCH_GENOME,A|stop_gained|HIGH|TRAPPC8|gene1363|transcript|rna1580|pseudogene|7/29|n.940C>T|p.Arg314*|940/5049|940/-1|314/-1||WARNING_TRANSCRIPT_MULTIPLE_STOP_CODONS&WARNING_REF_DOES_NOT_MATCH_GENOME;LOF=(TRAPPC8|gene1363|2|1.00);NMD=(TRAPPC8|gene1363|2|1.00)
snpEff LOF variants • 3.0k views
ADD COMMENT
0
Entering edit mode

Hello reza!

We believe that this post does not fit the main topic of this site.

Purely text processing. grep/awk should help you here.

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLY
0
Entering edit mode

@Ram parsing SNPEFF is not so easy. Especially because there is more than one prediction per variant.

ADD REPLY
0
Entering edit mode

Apologies, reza and Pierre Lindenbaum - I misunderstood the question. Reopening it now.

ADD REPLY
2
Entering edit mode
5.7 years ago

i guess you can use snpsift for this. Appropriate command is (modified from snpsift manual page):

java -jar SnpSift.jar filter "(exists LOF[*].PERC)" input.snpeff.vcf

snpsift manual here. As each LOF has certain % calculated, all LOF entries will have LOF perc as well. This can be further shortened as

java -jar SnpSift.jar filter "(exists LOF)" input.snpeff.vcf

with sed you can try, if you just want to filter by LOF. However, if you want to filter by LOF percentage, then better use snpsift:

sed -ne '/#/p;/LOF/p' input.snpeff.vcf
ADD COMMENT
0
Entering edit mode

thanks, your suggested way worked fine.

ADD REPLY
0
Entering edit mode

I've moved cpad's comment to an answer - you can now accept it to show that it works. Thank you!

ADD REPLY
3
Entering edit mode
5.7 years ago

using bioalcidaejdk:http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html

java -jar dist/bioalcidaejdk.jar -e 'stream().flatMap(V->tools.getAnnPredictions(V).stream().map(P->V.getContig()+" "+V.getStart()+" "+V.getAlleles()+" "+P.getGeneName())).forEach(S->println(S));' in.vcf
  • stream(). get a stream of variants
  • flatMap(V->tools.getAnnPredictions(V).stream() use an internal SNPEFF parser to convert to a stream of prediction
  • .map(P->V.getContig()+" "+V.getStart()+" "+V.getAlleles()+" "+P.getGeneName())). convert to CHROM/POS/ALLELES/GENE
  • forEach(S->println(S)); print things.
ADD COMMENT
0
Entering edit mode

thanks Pierre, your program worked but i want variants that have LOF tag (in my file 466 variants from 994779 variants). your program give me a file with 1526324 row!

ADD REPLY
1
Entering edit mode

You should mention this when you create the question, reza. Your question says you need gene names that have LOF variants, not variant records that are annotated as LOF causing.

ADD REPLY

Login before adding your answer.

Traffic: 1742 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6