Question: Switch the position of REF/ALT alleles based on AA (ancestral allele) annotations in VCF
1
gravatar for SOHAIL
3.3 years ago by
SOHAIL320
Beijing Institute of Genomics, CAS.
SOHAIL320 wrote:

Hi!

I have a VCF file with AA annotations from EPO. I want to switch the REF/ALT position of the REF/ALT alleles along with the respective genotypes of individuals accordingly, For instance, I have one VCF record such as

1       936210  rs3121569       C       A       124552  PASS    AC=236;AF=0.944;AN=250;DP=3971;set=Intersection;AA=A    0|0 0|1 1|1

i want to convert it into

1       936210  rs3121569       A      C       124552  PASS    AC=236;AF=0.944;AN=250;DP=3971;set=Intersection;AA=A 1|1 1|0 0|0

Moreover, I have to explain AA annotations are of five types:

        ACTG : high-confidence call, ancestral state supproted by the other two sequences
        actg : low-confindence call, ancestral state supported by one sequence only
        N    : failure, the ancestral state is not supported by any other sequence
        -    : the extant species contains an insertion at this postion
        .   : no coverage in the alignment

So, the REF/ALT will only be changed in above two cases only, others remained unchanged.

Is there any shared script/software can do that, Any suggestions??

Thank you!

-sohail

ngs vcf • 2.7k views
ADD COMMENTlink modified 16 months ago by jean.christophe.grenier0 • written 3.3 years ago by SOHAIL320

please, post a http://gist.github.com/ with a full VCF (header+ a few variants to be changed or not )

ADD REPLYlink written 3.3 years ago by Pierre Lindenbaum131k

Hi @Pierre I have shared the Sample VCF file (header + few variants with possible conditions) Please have a look (two files). Sample.vcf at gist

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by SOHAIL320

Thanks Pierre for these tools! I was looking for this kind of solution a while back ago. I have some problems though with the code. Here's what I'm getting :

java -jar jvarkit/dist/vcffilterjs.jar -f vcfRefAlt_to_AncDer.js test.10000.vcf
[INFO][Launcher]getting javascript manager
Warning: Nashorn engine is planned to be removed from a future JDK release
[INFO][Launcher]Compiling vcfRefAlt_to_AncDer.js
[SEVERE][VCFFilterJS]<eval>:1:60 Invalid return statement
if(variant.getNAlleles()!=2 || !variant.hasAttribute("AA")) return true;
                                                            ^ in <eval> at line number 1 at column number 60
javax.script.ScriptException: <eval>:1:60 Invalid return statement
if(variant.getNAlleles()!=2 || !variant.hasAttribute("AA")) return true;
                                                            ^ in <eval> at line number 1 at column number 60
    at jdk.scripting.nashorn/jdk.nashorn.api.scripting.NashornScriptEngine.throwAsScriptException(NashornScriptEngine.java:477)
    at jdk.scripting.nashorn/jdk.nashorn.api.scripting.NashornScriptEngine.asCompiledScript(NashornScriptEngine.java:503)
    at jdk.scripting.nashorn/jdk.nashorn.api.scripting.NashornScriptEngine.compile(NashornScriptEngine.java:184)
    at com.github.lindenb.jvarkit.util.jcommander.Launcher.compileJavascript(Launcher.java:700)
    at com.github.lindenb.jvarkit.tools.vcffilterjs.VCFFilterJS.doWork(VCFFilterJS.java:542)
    at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:756)
    at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:919)
    at com.github.lindenb.jvarkit.tools.vcffilterjs.VCFFilterJS.main(VCFFilterJS.java:560)
Caused by: jdk.nashorn.internal.runtime.ParserException: <eval>:1:60 Invalid return statement
if(variant.getNAlleles()!=2 || !variant.hasAttribute("AA")) return true;
                                                            ^
    at jdk.scripting.nashorn/jdk.nashorn.internal.parser.AbstractParser.error(AbstractParser.java:297)
    at jdk.scripting.nashorn/jdk.nashorn.internal.parser.AbstractParser.error(AbstractParser.java:282)
    at jdk.scripting.nashorn/jdk.nashorn.internal.parser.Parser.returnStatement(Parser.java:2318)
    at jdk.scripting.nashorn/jdk.nashorn.internal.parser.Parser.statement(Parser.java:1066)
    at jdk.scripting.nashorn/jdk.nashorn.internal.parser.Parser.getStatement(Parser.java:642)
    at jdk.scripting.nashorn/jdk.nashorn.internal.parser.Parser.getStatement(Parser.java:632)
    at jdk.scripting.nashorn/jdk.nashorn.internal.parser.Parser.ifStatement(Parser.java:1885)
    at jdk.scripting.nashorn/jdk.nashorn.internal.parser.Parser.statement(Parser.java:1048)
    at jdk.scripting.nashorn/jdk.nashorn.internal.parser.Parser.sourceElements(Parser.java:909)
    at jdk.scripting.nashorn/jdk.nashorn.internal.parser.Parser.program(Parser.java:844)
    at jdk.scripting.nashorn/jdk.nashorn.internal.parser.Parser.parse(Parser.java:325)
    at jdk.scripting.nashorn/jdk.nashorn.internal.parser.Parser.parse(Parser.java:285)
    at jdk.scripting.nashorn/jdk.nashorn.internal.runtime.Context.compile(Context.java:1500)
    at jdk.scripting.nashorn/jdk.nashorn.internal.runtime.Context.compileScript(Context.java:774)
    at jdk.scripting.nashorn/jdk.nashorn.api.scripting.NashornScriptEngine.asCompiledScript(NashornScriptEngine.java:500)
    ... 6 more
[INFO][Launcher]vcffilterjs Exited with failure (-1)

Do you have any idea of what's happening here? And once it's working, will it work with the vcf coming from the 1000G project where the AA field has multiple values separated with "|" character?

16      334532  rs201129086     G       A       100     PASS    AC=1;AF=0.000199681;AN=5008;NS=2504;DP=16727;EAS_AF=0;AMR_AF=0;AFR_AF=0.0008;EUR_AF=0;SAS_AF=0;AA=g|||;VT=SNP;EX_TARGET

Thanks a lot!

JC

ADD REPLYlink modified 16 months ago • written 16 months ago by jean.christophe.grenier0

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWER is for new answers to original question.

ADD REPLYlink written 16 months ago by genomax92k
2
gravatar for Pierre Lindenbaum
3.3 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum131k wrote:

The following script for http://lindenb.github.io/jvarkit/VcfFilterJdk.html seems to work:

if(variant.getNAlleles()!=2 || !variant.hasAttribute("AA")) return true;
final String aa = variant.getAttributeAsString("AA","");
if(!variant.getAlleles().get(1).getDisplayString().equalsIgnoreCase(aa)) return true;
VariantContextBuilder vb=new VariantContextBuilder(variant);

Allele oldalt =  variant.getAlleles().get(1);
Allele oldref =  variant.getAlleles().get(0);
Allele ref= Allele.create(oldalt.getDisplayString(),true);
Allele alt= Allele.create(oldref.getDisplayString(),false);

vb.alleles(Arrays.asList(ref,alt));

List<Genotype> genotypes= new ArrayList<>();
for(Genotype g: variant.getGenotypes())
    {
    if(!g.isCalled()) { genotypes.add(g); continue;}
    GenotypeBuilder gb = new GenotypeBuilder(g);
    List<Allele> alleles = new ArrayList<>();
    for(Allele a:g.getAlleles())
        {
        if(a.equals(oldalt)) { a=ref;}
        else if(a.equals(oldref)) { a=alt;}
        alleles.add(a);
        }
    genotypes.add(gb.alleles(alleles).make());
    }
vb.genotypes(genotypes);
return vb.make();

usage:

 java -jar dist/vcffilterjdk.jar -f script.js input.vcf



1  10575   .            C  G  67.27    PASS  AA=.;AC=0;AF=0.00;AN=18;DP=85;set=Intersection     GT:AD:DP:GQ:PL          0/0:1,0:1:3:0,3,28             0/0:11,0:11:30:0,30,450                 0/0:7,0:7:21:0,21,226                 0/0:9,0:9:24:0,24,360          0/0:6,0:6:18:0,18,168         0/0:26,0:26:60:0,60,900                  0/0:7,0:7:21:0,21,218                  0/0:10,0:10:27:0,27,405              0/0:8,0:8:24:0,24,246
1  12882   .            C  G  660.23   PASS  AA=c;AC=0;AF=0.00;AN=18;DP=143;set=Intersection    GT:AD:DP:GQ:PL          0/0:7,0:7:21:0,21,253          0/0:11,0:11:33:0,33,377                 0/0:12,0:12:36:0,36,417               0/0:23,0:23:66:0,66,990        0/0:16,0:16:13:0,13,470       0/0:29,0:29:81:0,81,839                  0/0:24,0:24:63:0,63,945                0/0:15,0:15:45:0,45,481              0/0:6,0:6:18:0,18,193
1  13079   .            C  G  1594.47  PASS  AA=c;AC=3;AF=0.167;AN=18;DP=340;set=Intersection   GT:AD:DP:GQ:PL          0/1:10,4:14:78:78,0,226        0/1:32,4:36:9:9,0,744                   0/0:29,3:32:4:0,4,710                 0/0:37,0:37:99:0,109,1206      0/0:56,4:60:61:0,61,1501      0/0:33,0:33:99:0,99,1155                 0/1:51,8:59:41:41,0,1276               0/0:55,0:55:0:0,0,1269               0/0:14,0:14:42:0,42,425
1  17730   .            C  A  9050.45  PASS  AA=-;AC=2;AF=0.111;AN=18;DP=230;set=Intersection   GT:AD:DP:GQ:PGT:PID:PL  0/0:28,0:28:48:.:.:0,48,696    0/0:31,0:31:93:.:.:0,93,959             0/0:28,2:30:1:0|1:17722_A_G:0,1,1171  0/0:46,0:46:99:.:.:0,113,1434  0/0:35,0:35:69:.:.:0,69,1054  0/0:27,0:27:46:.:.:0,46,884              0/1:9,4:13:99:0|1:17722_A_G:141,0,617  0/1:7,2:9:63:0|1:17722_A_G:63,0,320  0/0:11,0:11:14:.:.:0,14,274
1  49272   rs370116346  G  A  397.31   PASS  AA=N;AC=4;AF=0.286;AN=14;DP=180;set=Intersection   GT:AD:DP:GQ:PL          1/1:0,3:3:9:95,9,0             0/0:25,2:27:28:0,28,742                 0/0:31,0:31:90:0,90,1350              1/1:0,4:4:12:125,12,0          0/0:36,4:40:1:0,1,939         0/0:36,0:36:81:0,81,1215                 ./.:0,0:0:.:0,0,0                      0/0:39,0:39:83:0,83,1138             ./.:0,0:0:.:0,0,0
1  936210  rs3121569    A  C  124552   PASS  AA=A;AC=17;AF=0.944;AN=18;DP=224;set=Intersection  GT:AD:DP:GQ:PGT:PID:PL  0/0:0,37:37:99:.:.:1098,108,0  0/0:0,23:23:69:1|1:936194_A_G:987,69,0  0/0:0,29:29:85:.:.:860,85,0           0/0:0,28:28:83:.:.:911,83,0    0/0:0,20:20:60:.:.:673,60,0   0/0:0,24:24:75:1|1:936194_A_G:1071,75,0  0/0:0,29:29:87:.:.:1011,87,0           1/0:7,7:14:99:.:.:206,0,144          0/0:1,19:20:34:.:.:514,34,0
ADD COMMENTlink written 3.3 years ago by Pierre Lindenbaum131k

Hi @Pierre, Thank you for suggesting your useful scripts as always. :).. however, i have some problems in vcffilterjdk installation. i have posted the issue here Please have look.

ADD REPLYlink written 3.3 years ago by SOHAIL320

Dear Pierre,

Thank you for your reply. Your vcf filtering tool seems to be very useful. However, I noticed that you forgot to recode the AC (alternative allele count) field as well as the AD and PL ones after switching the alleles.

Would it be possible for you to edit the code to include these changes? It would be of great help, and of course, we would credit you when we publish our data.

Thanks in advance!

ADD REPLYlink written 2.6 years ago by dkmanruiz0
2

not tested, but you can try:

if(variant.getNAlleles()!=2 || !variant.hasAttribute("AA")) return true;
final String aa = variant.getAttributeAsString("AA","");
if(!variant.getAlleles().get(1).getDisplayString().equalsIgnoreCase(aa)) return true;
VariantContextBuilder vb=new VariantContextBuilder(variant);

Allele oldalt =  variant.getAlleles().get(1);
Allele oldref =  variant.getAlleles().get(0);
Allele ref= Allele.create(oldalt.getDisplayString(),true);
Allele alt= Allele.create(oldref.getDisplayString(),false);

vb.alleles(Arrays.asList(ref,alt));

List<Genotype> genotypes= new ArrayList<>();
for(Genotype g: variant.getGenotypes())
    {
    if(!g.isCalled()) { genotypes.add(g); continue;}
    GenotypeBuilder gb = new GenotypeBuilder(g);
    List<Allele> alleles = new ArrayList<>();
    for(Allele a:g.getAlleles())
        {
        if(a.equals(oldalt)) { a=ref;}
        else if(a.equals(oldref)) { a=alt;}
        alleles.add(a);
        }
    if(g.hasPL()) {
        int pl[] = g.getPL();
        int pl2[] = new int[pl.length];
        for(int i=0;i< pl.length;i++) pl2[i]=pl[(pl.length-1)-i];
        gb.PL(pl2);
        }
    genotypes.add(gb.alleles(alleles).make());
    }

vb.attribute("AC",variant.getGenotypes().stream().flatMap(G->G.getAlleles().stream()).filter(A->A.equals(oldref)).count());
 vb.genotypes(genotypes);
return vb.make();
ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Pierre Lindenbaum131k
1

later: fixed an error with the AC field.

ADD REPLYlink written 2.6 years ago by Pierre Lindenbaum131k

Dear Pierre,

Thank you very much for taking the time to look into it. However, I encountered the following error:

[SEVERE][Launcher]Allele C* is not an allele in the variant context java.lang.RuntimeException: Allele C* is not an allele in the variant context at htsjdk.variant.vcf.VCFEncoder.writeAllele(VCFEncoder.java:380) at htsjdk.variant.vcf.VCFEncoder.addGenotypeData(VCFEncoder.java:266) at htsjdk.variant.vcf.VCFEncoder.encode(VCFEncoder.java:137) at htsjdk.variant.variantcontext.writer.VCFWriter.add(VCFWriter.java:224) at com.github.lindenb.jvarkit.util.vcf.DelegateVariantContextWriter.add(DelegateVariantContextWriter.java:56) at com.github.lindenb.jvarkit.util.vcf.DelegateVariantContextWriter.add(DelegateVariantContextWriter.java:56) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk$CtxWriterFactory$CtxWriter.add(VcfFilterJdk.java:478) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk.doVcfToVcf(VcfFilterJdk.java:699) at com.github.lindenb.jvarkit.util.jcommander.Launcher.doVcfToVcf(Launcher.java:1035) at com.github.lindenb.jvarkit.util.jcommander.Launcher.doVcfToVcf(Launcher.java:1055) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk.doWork(VcfFilterJdk.java:722) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:1208) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:1366) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk.main(VcfFilterJdk.java:737) [INFO][Launcher]vcffilterjdk Exited with failure (-1)

Checking the output, I realized that the programme crashed when it reached the first row where alleles should be swapped. All previous lines were properly returned.

I broke the code down into sections in order to narrow down the source of the problem, and I verified that while the first part works properly (REF and ALT alleles do swap if I only run that code), both the section that swaps genotypes as well as the one that recalculates the AC field yield the aforementioned error.

I tried to look into it on my own, and I came across this report by you: https://github.com/samtools/htsjdk/issues/1026, but I'm afraid I'm not able to reconcile it with the original code.

Is there anything that can be done about it? Thanks once again for your invaluable help.

ADD REPLYlink written 2.6 years ago by dkmanruiz0

hard to answer without your data.

ADD REPLYlink written 2.6 years ago by Pierre Lindenbaum131k

Right.

Here I uploaded the input toy VCF: https://github.com/Arynio/genome-annotation/blob/master/test.ann.vcf And this is the output returned by VcfFilterJdk: https://github.com/Arynio/genome-annotation/blob/master/test_polarized.ann.vcf

As you can see, the programme crashed while processing the third variant, which is the first where AA = ALT.

ADD REPLYlink written 2.6 years ago by dkmanruiz0
1

my bad, bad copy+paste, I forgot the

vb.genotypes(genotypes);

before the last return. I'll update the code above.

ADD REPLYlink written 2.6 years ago by Pierre Lindenbaum131k

It works perfectly now. Thank you very much!

Would it be possible for you to recode the AD and the AF fields as well so that there's no "crossed" information left in the VCF? I tried to fix the AF on my own with this code: vb.attribute("AF",(variant.getGenotypes().stream().flatMap(G->G.getAlleles().stream()).filter(A->A.equals(oldref)).count())/(variant.getGenotypes().stream().flatMap(G->G.getAlleles().stream()).count())); but I don't have any knowledge in java so I failed, haha.

I'm sorry if I'm taking too much of your time, but judging by the lack of a default software to polarize VCFs, and the number of views of this topic, I'm sure it will be of great help for many others. :)

ADD REPLYlink written 2.6 years ago by dkmanruiz0

Problem solved...! Thanks

ADD REPLYlink written 3.3 years ago by SOHAIL320

Thanks Pierre, very useful!!

But In the case you had a phased vcf file to start with, does this script maintain the initial (and correct) haplotypes after swapping alleles (REF by ANCESTRAL etc)?

I cannot read java, sorry, but my 'guess' is that it likely does not (?) If so, can we say that is it only useful for unphased vcf-s?

santos

ADD REPLYlink written 19 months ago by santos.alonso0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1436 users visited in the last hour