Switch the position of REF/ALT alleles based on AA (ancestral allele) annotations in VCF
1
1
Entering edit mode
6.7 years ago
SOHAIL ▴ 400

Hi!

I have a VCF file with AA annotations from EPO. I want to switch the REF/ALT position of the REF/ALT alleles along with the respective genotypes of individuals accordingly, For instance, I have one VCF record such as

1       936210  rs3121569       C       A       124552  PASS    AC=236;AF=0.944;AN=250;DP=3971;set=Intersection;AA=A    0|0 0|1 1|1

i want to convert it into

1       936210  rs3121569       A      C       124552  PASS    AC=236;AF=0.944;AN=250;DP=3971;set=Intersection;AA=A 1|1 1|0 0|0

Moreover, I have to explain AA annotations are of five types:

        ACTG : high-confidence call, ancestral state supproted by the other two sequences
        actg : low-confindence call, ancestral state supported by one sequence only
        N    : failure, the ancestral state is not supported by any other sequence
        -    : the extant species contains an insertion at this postion
        .   : no coverage in the alignment

So, the REF/ALT will only be changed in above two cases only, others remained unchanged.

Is there any shared script/software can do that, Any suggestions??

Thank you!

-sohail

vcf ngs • 5.8k views
ADD COMMENT
0
Entering edit mode

please, post a http://gist.github.com/ with a full VCF (header+ a few variants to be changed or not )

ADD REPLY
0
Entering edit mode

Hi @Pierre I have shared the Sample VCF file (header + few variants with possible conditions) Please have a look (two files). Sample.vcf at gist

ADD REPLY
0
Entering edit mode

Thanks Pierre for these tools! I was looking for this kind of solution a while back ago. I have some problems though with the code. Here's what I'm getting :

java -jar jvarkit/dist/vcffilterjs.jar -f vcfRefAlt_to_AncDer.js test.10000.vcf
[INFO][Launcher]getting javascript manager
Warning: Nashorn engine is planned to be removed from a future JDK release
[INFO][Launcher]Compiling vcfRefAlt_to_AncDer.js
[SEVERE][VCFFilterJS]<eval>:1:60 Invalid return statement
if(variant.getNAlleles()!=2 || !variant.hasAttribute("AA")) return true;
                                                            ^ in <eval> at line number 1 at column number 60
javax.script.ScriptException: <eval>:1:60 Invalid return statement
if(variant.getNAlleles()!=2 || !variant.hasAttribute("AA")) return true;
                                                            ^ in <eval> at line number 1 at column number 60
    at jdk.scripting.nashorn/jdk.nashorn.api.scripting.NashornScriptEngine.throwAsScriptException(NashornScriptEngine.java:477)
    at jdk.scripting.nashorn/jdk.nashorn.api.scripting.NashornScriptEngine.asCompiledScript(NashornScriptEngine.java:503)
    at jdk.scripting.nashorn/jdk.nashorn.api.scripting.NashornScriptEngine.compile(NashornScriptEngine.java:184)
    at com.github.lindenb.jvarkit.util.jcommander.Launcher.compileJavascript(Launcher.java:700)
    at com.github.lindenb.jvarkit.tools.vcffilterjs.VCFFilterJS.doWork(VCFFilterJS.java:542)
    at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:756)
    at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:919)
    at com.github.lindenb.jvarkit.tools.vcffilterjs.VCFFilterJS.main(VCFFilterJS.java:560)
Caused by: jdk.nashorn.internal.runtime.ParserException: <eval>:1:60 Invalid return statement
if(variant.getNAlleles()!=2 || !variant.hasAttribute("AA")) return true;
                                                            ^
    at jdk.scripting.nashorn/jdk.nashorn.internal.parser.AbstractParser.error(AbstractParser.java:297)
    at jdk.scripting.nashorn/jdk.nashorn.internal.parser.AbstractParser.error(AbstractParser.java:282)
    at jdk.scripting.nashorn/jdk.nashorn.internal.parser.Parser.returnStatement(Parser.java:2318)
    at jdk.scripting.nashorn/jdk.nashorn.internal.parser.Parser.statement(Parser.java:1066)
    at jdk.scripting.nashorn/jdk.nashorn.internal.parser.Parser.getStatement(Parser.java:642)
    at jdk.scripting.nashorn/jdk.nashorn.internal.parser.Parser.getStatement(Parser.java:632)
    at jdk.scripting.nashorn/jdk.nashorn.internal.parser.Parser.ifStatement(Parser.java:1885)
    at jdk.scripting.nashorn/jdk.nashorn.internal.parser.Parser.statement(Parser.java:1048)
    at jdk.scripting.nashorn/jdk.nashorn.internal.parser.Parser.sourceElements(Parser.java:909)
    at jdk.scripting.nashorn/jdk.nashorn.internal.parser.Parser.program(Parser.java:844)
    at jdk.scripting.nashorn/jdk.nashorn.internal.parser.Parser.parse(Parser.java:325)
    at jdk.scripting.nashorn/jdk.nashorn.internal.parser.Parser.parse(Parser.java:285)
    at jdk.scripting.nashorn/jdk.nashorn.internal.runtime.Context.compile(Context.java:1500)
    at jdk.scripting.nashorn/jdk.nashorn.internal.runtime.Context.compileScript(Context.java:774)
    at jdk.scripting.nashorn/jdk.nashorn.api.scripting.NashornScriptEngine.asCompiledScript(NashornScriptEngine.java:500)
    ... 6 more
[INFO][Launcher]vcffilterjs Exited with failure (-1)

Do you have any idea of what's happening here? And once it's working, will it work with the vcf coming from the 1000G project where the AA field has multiple values separated with "|" character?

16      334532  rs201129086     G       A       100     PASS    AC=1;AF=0.000199681;AN=5008;NS=2504;DP=16727;EAS_AF=0;AMR_AF=0;AFR_AF=0.0008;EUR_AF=0;SAS_AF=0;AA=g|||;VT=SNP;EX_TARGET

Thanks a lot!

JC

ADD REPLY
0
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWER is for new answers to original question.

ADD REPLY
2
Entering edit mode
6.7 years ago

The following script for http://lindenb.github.io/jvarkit/VcfFilterJdk.html seems to work:

if(variant.getNAlleles()!=2 || !variant.hasAttribute("AA")) return true;
final String aa = variant.getAttributeAsString("AA","");
if(!variant.getAlleles().get(1).getDisplayString().equalsIgnoreCase(aa)) return true;
VariantContextBuilder vb=new VariantContextBuilder(variant);

Allele oldalt =  variant.getAlleles().get(1);
Allele oldref =  variant.getAlleles().get(0);
Allele ref= Allele.create(oldalt.getDisplayString(),true);
Allele alt= Allele.create(oldref.getDisplayString(),false);

vb.alleles(Arrays.asList(ref,alt));

List<Genotype> genotypes= new ArrayList<>();
for(Genotype g: variant.getGenotypes())
    {
    if(!g.isCalled()) { genotypes.add(g); continue;}
    GenotypeBuilder gb = new GenotypeBuilder(g);
    List<Allele> alleles = new ArrayList<>();
    for(Allele a:g.getAlleles())
        {
        if(a.equals(oldalt)) { a=ref;}
        else if(a.equals(oldref)) { a=alt;}
        alleles.add(a);
        }
    genotypes.add(gb.alleles(alleles).make());
    }
vb.genotypes(genotypes);
return vb.make();

usage:

 java -jar dist/vcffilterjdk.jar -f script.js input.vcf



1  10575   .            C  G  67.27    PASS  AA=.;AC=0;AF=0.00;AN=18;DP=85;set=Intersection     GT:AD:DP:GQ:PL          0/0:1,0:1:3:0,3,28             0/0:11,0:11:30:0,30,450                 0/0:7,0:7:21:0,21,226                 0/0:9,0:9:24:0,24,360          0/0:6,0:6:18:0,18,168         0/0:26,0:26:60:0,60,900                  0/0:7,0:7:21:0,21,218                  0/0:10,0:10:27:0,27,405              0/0:8,0:8:24:0,24,246
1  12882   .            C  G  660.23   PASS  AA=c;AC=0;AF=0.00;AN=18;DP=143;set=Intersection    GT:AD:DP:GQ:PL          0/0:7,0:7:21:0,21,253          0/0:11,0:11:33:0,33,377                 0/0:12,0:12:36:0,36,417               0/0:23,0:23:66:0,66,990        0/0:16,0:16:13:0,13,470       0/0:29,0:29:81:0,81,839                  0/0:24,0:24:63:0,63,945                0/0:15,0:15:45:0,45,481              0/0:6,0:6:18:0,18,193
1  13079   .            C  G  1594.47  PASS  AA=c;AC=3;AF=0.167;AN=18;DP=340;set=Intersection   GT:AD:DP:GQ:PL          0/1:10,4:14:78:78,0,226        0/1:32,4:36:9:9,0,744                   0/0:29,3:32:4:0,4,710                 0/0:37,0:37:99:0,109,1206      0/0:56,4:60:61:0,61,1501      0/0:33,0:33:99:0,99,1155                 0/1:51,8:59:41:41,0,1276               0/0:55,0:55:0:0,0,1269               0/0:14,0:14:42:0,42,425
1  17730   .            C  A  9050.45  PASS  AA=-;AC=2;AF=0.111;AN=18;DP=230;set=Intersection   GT:AD:DP:GQ:PGT:PID:PL  0/0:28,0:28:48:.:.:0,48,696    0/0:31,0:31:93:.:.:0,93,959             0/0:28,2:30:1:0|1:17722_A_G:0,1,1171  0/0:46,0:46:99:.:.:0,113,1434  0/0:35,0:35:69:.:.:0,69,1054  0/0:27,0:27:46:.:.:0,46,884              0/1:9,4:13:99:0|1:17722_A_G:141,0,617  0/1:7,2:9:63:0|1:17722_A_G:63,0,320  0/0:11,0:11:14:.:.:0,14,274
1  49272   rs370116346  G  A  397.31   PASS  AA=N;AC=4;AF=0.286;AN=14;DP=180;set=Intersection   GT:AD:DP:GQ:PL          1/1:0,3:3:9:95,9,0             0/0:25,2:27:28:0,28,742                 0/0:31,0:31:90:0,90,1350              1/1:0,4:4:12:125,12,0          0/0:36,4:40:1:0,1,939         0/0:36,0:36:81:0,81,1215                 ./.:0,0:0:.:0,0,0                      0/0:39,0:39:83:0,83,1138             ./.:0,0:0:.:0,0,0
1  936210  rs3121569    A  C  124552   PASS  AA=A;AC=17;AF=0.944;AN=18;DP=224;set=Intersection  GT:AD:DP:GQ:PGT:PID:PL  0/0:0,37:37:99:.:.:1098,108,0  0/0:0,23:23:69:1|1:936194_A_G:987,69,0  0/0:0,29:29:85:.:.:860,85,0           0/0:0,28:28:83:.:.:911,83,0    0/0:0,20:20:60:.:.:673,60,0   0/0:0,24:24:75:1|1:936194_A_G:1071,75,0  0/0:0,29:29:87:.:.:1011,87,0           1/0:7,7:14:99:.:.:206,0,144          0/0:1,19:20:34:.:.:514,34,0
ADD COMMENT
0
Entering edit mode

Hi @Pierre, Thank you for suggesting your useful scripts as always. :).. however, i have some problems in vcffilterjdk installation. i have posted the issue here Please have look.

ADD REPLY
0
Entering edit mode

Dear Pierre,

Thank you for your reply. Your vcf filtering tool seems to be very useful. However, I noticed that you forgot to recode the AC (alternative allele count) field as well as the AD and PL ones after switching the alleles.

Would it be possible for you to edit the code to include these changes? It would be of great help, and of course, we would credit you when we publish our data.

Thanks in advance!

ADD REPLY
2
Entering edit mode

not tested, but you can try:

if(variant.getNAlleles()!=2 || !variant.hasAttribute("AA")) return true;
final String aa = variant.getAttributeAsString("AA","");
if(!variant.getAlleles().get(1).getDisplayString().equalsIgnoreCase(aa)) return true;
VariantContextBuilder vb=new VariantContextBuilder(variant);

Allele oldalt =  variant.getAlleles().get(1);
Allele oldref =  variant.getAlleles().get(0);
Allele ref= Allele.create(oldalt.getDisplayString(),true);
Allele alt= Allele.create(oldref.getDisplayString(),false);

vb.alleles(Arrays.asList(ref,alt));

List<Genotype> genotypes= new ArrayList<>();
for(Genotype g: variant.getGenotypes())
    {
    if(!g.isCalled()) { genotypes.add(g); continue;}
    GenotypeBuilder gb = new GenotypeBuilder(g);
    List<Allele> alleles = new ArrayList<>();
    for(Allele a:g.getAlleles())
        {
        if(a.equals(oldalt)) { a=ref;}
        else if(a.equals(oldref)) { a=alt;}
        alleles.add(a);
        }
    if(g.hasPL()) {
        int pl[] = g.getPL();
        int pl2[] = new int[pl.length];
        for(int i=0;i< pl.length;i++) pl2[i]=pl[(pl.length-1)-i];
        gb.PL(pl2);
        }
    genotypes.add(gb.alleles(alleles).make());
    }

vb.attribute("AC",variant.getGenotypes().stream().flatMap(G->G.getAlleles().stream()).filter(A->A.equals(oldref)).count());
 vb.genotypes(genotypes);
return vb.make();
ADD REPLY
1
Entering edit mode

later: fixed an error with the AC field.

ADD REPLY
0
Entering edit mode

Dear Pierre,

Thank you very much for taking the time to look into it. However, I encountered the following error:

[SEVERE][Launcher]Allele C* is not an allele in the variant context java.lang.RuntimeException: Allele C* is not an allele in the variant context at htsjdk.variant.vcf.VCFEncoder.writeAllele(VCFEncoder.java:380) at htsjdk.variant.vcf.VCFEncoder.addGenotypeData(VCFEncoder.java:266) at htsjdk.variant.vcf.VCFEncoder.encode(VCFEncoder.java:137) at htsjdk.variant.variantcontext.writer.VCFWriter.add(VCFWriter.java:224) at com.github.lindenb.jvarkit.util.vcf.DelegateVariantContextWriter.add(DelegateVariantContextWriter.java:56) at com.github.lindenb.jvarkit.util.vcf.DelegateVariantContextWriter.add(DelegateVariantContextWriter.java:56) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk$CtxWriterFactory$CtxWriter.add(VcfFilterJdk.java:478) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk.doVcfToVcf(VcfFilterJdk.java:699) at com.github.lindenb.jvarkit.util.jcommander.Launcher.doVcfToVcf(Launcher.java:1035) at com.github.lindenb.jvarkit.util.jcommander.Launcher.doVcfToVcf(Launcher.java:1055) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk.doWork(VcfFilterJdk.java:722) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:1208) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:1366) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk.main(VcfFilterJdk.java:737) [INFO][Launcher]vcffilterjdk Exited with failure (-1)

Checking the output, I realized that the programme crashed when it reached the first row where alleles should be swapped. All previous lines were properly returned.

I broke the code down into sections in order to narrow down the source of the problem, and I verified that while the first part works properly (REF and ALT alleles do swap if I only run that code), both the section that swaps genotypes as well as the one that recalculates the AC field yield the aforementioned error.

I tried to look into it on my own, and I came across this report by you: https://github.com/samtools/htsjdk/issues/1026, but I'm afraid I'm not able to reconcile it with the original code.

Is there anything that can be done about it? Thanks once again for your invaluable help.

ADD REPLY
0
Entering edit mode

hard to answer without your data.

ADD REPLY
0
Entering edit mode

Right.

Here I uploaded the input toy VCF: https://github.com/Arynio/genome-annotation/blob/master/test.ann.vcf And this is the output returned by VcfFilterJdk: https://github.com/Arynio/genome-annotation/blob/master/test_polarized.ann.vcf

As you can see, the programme crashed while processing the third variant, which is the first where AA = ALT.

ADD REPLY
1
Entering edit mode

my bad, bad copy+paste, I forgot the

vb.genotypes(genotypes);

before the last return. I'll update the code above.

ADD REPLY
0
Entering edit mode

It works perfectly now. Thank you very much!

Would it be possible for you to recode the AD and the AF fields as well so that there's no "crossed" information left in the VCF? I tried to fix the AF on my own with this code: vb.attribute("AF",(variant.getGenotypes().stream().flatMap(G->G.getAlleles().stream()).filter(A->A.equals(oldref)).count())/(variant.getGenotypes().stream().flatMap(G->G.getAlleles().stream()).count())); but I don't have any knowledge in java so I failed, haha.

I'm sorry if I'm taking too much of your time, but judging by the lack of a default software to polarize VCFs, and the number of views of this topic, I'm sure it will be of great help for many others. :)

ADD REPLY
0
Entering edit mode

Problem solved...! Thanks

ADD REPLY
0
Entering edit mode

Thanks Pierre, very useful!!

But In the case you had a phased vcf file to start with, does this script maintain the initial (and correct) haplotypes after swapping alleles (REF by ANCESTRAL etc)?

I cannot read java, sorry, but my 'guess' is that it likely does not (?) If so, can we say that is it only useful for unphased vcf-s?

santos

ADD REPLY

Login before adding your answer.

Traffic: 1511 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6