Question: Beagle 4.1 error : Possible data conversion issue
1
gravatar for aritra90
22 months ago by
aritra9020
United States
aritra9020 wrote:

Hi, I have PLINK format data (PED/MAP)  and I wanted to convert this to VCF so that I can input it in BEAGLE 4.1 to phase them, as BEAGLE only use VCF format. I wanted a trivial one line solution and not a pipeline using PSEQ or MEGA2, etc. 

I saw in PLINK1.9 one can just use --recode vcf to achieve this. However when I did this and ran beagle (gt) on the input its giving me Java exceptions/errors. Its not a problem with beagle jar file as it runs well with the sample VCF format data downloaded from 1000Genomes. However, when I convert the data to VCF using PLINK and then use it as BEAGLE 4.1 input, then it doesn't like it. It'd be great if anyone can help me with this, such as, if there's any workaround, other simplistic methods to convert PLINK to VCF for BEAGLE input. 

Error snippet: 

Exception in thread "main" java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: nSamples==0
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
 at java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:593)
 at java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:677)
 at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:735)
 at java.util.stream.ReduceOps$ReduceOp.evaluateParallel(ReduceOps.java:714)
 at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
 at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
 at h.G.c(Unknown Source)
 at h.G.a(Unknown Source)
 at main.Main.main(Unknown Source)
Caused by: java.lang.IllegalArgumentException: nSamples==0
 at h.I.<init>(Unknown Source)
 at h.e.<init>(Unknown Source)
 at h.G.a(Unknown Source)
 at h.G.a(Unknown Source)
 at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
 at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
 at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
 at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
 at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:747)
 at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:721)
 at java.util.stream.AbstractTask.compute(AbstractTask.java:316)
 at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
 at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
 at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
 at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
 at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

To give a description of what I am doing to convert PLINK to VCF: 

1) Converting PLINK to .bgl using PLINK 1.9 

2) Converting .bgl to vcf using beagle2vcf.jar 

3) post processing to make it tab separated. 

4) running Beagle 4.1 only to get the aforementioned error. 

Thanks,
Aritra

beagle snp plink vcf • 1.9k views
ADD COMMENTlink modified 20 months ago by Nicholas Mancuso50 • written 22 months ago by aritra9020

Can you post the errors?

ADD REPLYlink written 22 months ago by Zev.Kronenberg11k

Hi Zev, 

Added the errors. 

Thanks. 

ADD REPLYlink written 21 months ago by aritra9020

Have you tried running --list-duplicate-vars, and then using --exclude on the listed variant IDs before exporting a VCF?

Also, what do the first few non-header lines of the VCF look like?

ADD REPLYlink modified 21 months ago • written 21 months ago by chrchang5232.6k

Thanks for the input, Christopher. I did --list-duplicate-vars but for my particular dataset it didn't return any duplicate variants (I do get dupvars for other datasets which I am not using currently) The non-header lines of VCF file (after --recode vcf) looks like this: 

1       752566  rs3094315       T       C       .       .       PR      GT      0/0     0/0     1/1     0/0     0/1     0/1     0/0     1/1     0/0     0/0     1/1     ......

the header lines look like this: 

##fileformat=VCFv4.2
##fileDate=20151203
##source=PLINKv1.90
##contig=<ID=1,length=249198165>
##contig=<ID=2,length=242996590>
##contig=<ID=3,length=197793906>
##contig=<ID=4,length=190723161>
##contig=<ID=5,length=180666277>
##contig=<ID=6,length=170823380>
##contig=<ID=7,length=158928570>
##contig=<ID=8,length=146239141>
##contig=<ID=9,length=141010458>
##contig=<ID=10,length=134966155>
##contig=<ID=11,length=134905782>
##contig=<ID=12,length=133734114>
##contig=<ID=13,length=115074879>
##contig=<ID=14,length=107285438>
##contig=<ID=15,length=102388693>
##contig=<ID=16,length=90141356>
##contig=<ID=17,length=81004771>
##contig=<ID=18,length=77984346>
##contig=<ID=19,length=58949580>
##contig=<ID=20,length=62906515>
##contig=<ID=21,length=48050389>
##contig=<ID=22,length=51024838>
##INFO=<ID=PR,Number=0,Type=Flag,Description="Provisional reference allele, may not be based on real reference genome">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT........

The error that I get in Beagle 4 when I am using the --recode-vcf file is this: 

Exception in thread "main" java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: 1 (followed by a bunch of java exceptions)

It'd be great if you can help me with this. 

Thanks. 

 

 

ADD REPLYlink written 21 months ago by aritra9020

I'm running into the same issue. Have you resolved this or found another work-around?

ADD REPLYlink written 20 months ago by Nicholas Mancuso50
4
gravatar for Nicholas Mancuso
20 months ago by
United States
Nicholas Mancuso50 wrote:

I think I found the issue. The current version of plink (as of 1/7/2016) has a subtle bug that discards the alternate allele code if there is actually no genetic variation present in the data (due to missing calls, etc.). Until this is fixed a current work-around is to simply remove any allele that has MAF = 0 before hand. Hopefully there aren't too many.

ADD COMMENTlink written 20 months ago by Nicholas Mancuso50
2

This is required by the VCF specification; it is not a PLINK bug.  (That's why TASSEL has the same "bug".)  With that said, it sounds like you found the best workaround.

ADD REPLYlink written 20 months ago by chrchang5232.6k

This sounds like a good alternative. Thanks guys :) 

ADD REPLYlink written 20 months ago by aritra9020

Just to say this solved my

“Exception in thread "main" java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: 1”

issue as well :) Thanks!

ADD REPLYlink written 20 months ago by dani.mag.fernandes10
0
gravatar for Philipp Bayer
21 months ago by
Philipp Bayer4.6k
Australia/Perth/UWA
Philipp Bayer4.6k wrote:

It looks like you ran into a problem with the difference between Beagle 3 and Beagle 4

This is your main error message:

>Caused by: java.lang.IllegalArgumentException: nSamples==0

First of all, that looks like a comparison instead of an assignment (shouldn't it be "nSamples=0?"), second of all, looking at the Beagle v4.1 manual there is no argument nSamples (only nthreads, niterations), that argument however exists in the Beagle 3 manual

If you look at the release notes, nsamples was dropped in the Beagle 4.1 (03Oct15.284) release

Solution: Use Beagle 3, or drop the nsamples parameter if you don't need it

ADD COMMENTlink modified 21 months ago • written 21 months ago by Philipp Bayer4.6k

Hi Philipp, 

I have Beagle 3 working, but wanted to use Beagle 4 as it has some advancements which we will need. I know that Beagle 4.1 doesn't have any nsamples argument and I don't include that when I run the jar file as well. 

I run this: java -jar beagle.09Nov15.d2a.jar gt=inputfile.vcf.gz out=outfile.gt and get that Java exception, which baffled me. I need to use Beagle as a part of a script, hence need to make it generic as much as I can without dependencies like Mega2, PLINKSEQ, etc. 

Thanks for your input,

Aritra

ADD REPLYlink written 21 months ago by aritra9020
1

Thank you for the command!

Tricky! I've now decompiled Beagle 4.1 and it does use nSamples internally a few times, so I wouldn't be surprised if your VCF accidentally triggers stuff like this:

    if (paramF.c() == 0) {
      throw new IllegalArgumentException("nSamples==0");
    }

So internally it keeps on using nSamples but you as the user can't touch it. It seems to build the number for nSamples automatically in Beagle 4.

Since it's decompiled Java code which leads to weird variable and method-names it's hard to check what's going on exactly, all I can see is that the method c() checks for the length of something (number of alleles? SNPs? individuals?).

I can't check the original source code since Beagle's page says that that one is only available once the paper is out.

So you're indeed correct in your first post, there's something weird or missing in the PLINK-converted output that Beagle assumes something about. Can you find any differences between the file you now have (inputfile.vcf.gz) and the 1000Genomes file that originally worked? "/" instead of "|"?

You could try to convert the PLINK files using TASSEL's graphical interface - Data -> Load -> Load PLINK, then followed by Data -> Export -> Write VCF, maybe that output file will have the thing Beagle is missing in PLINK 1.9's conversion output

Edit: TASSEL is here: http://www.maizegenetics.net/#!tassel/c17q9

ADD REPLYlink modified 21 months ago • written 21 months ago by Philipp Bayer4.6k

Hi Philipp, 

 

Thanks for looking into it so much. TASSEL and PLINK --recode vcf gives the same file as output and I get the following error when I run those:

Caused by: java.lang.IllegalArgumentException: 1
        at h.d.b(Unknown Source)
        at h.d.<init>(Unknown Source)
        at h.I.<init>(Unknown Source)
        at h.e.<init>(Unknown Source)
        at h.G.a(Unknown Source)
        at h.G.a(Unknown Source)

But, when I convert PLINK to .bgl using --recode bgl -nomap and then convert it to VCF using beagle2vcf.jar from Beagle 4.1 utilities I get the nSamples==0 error, as .bgl is BEAGLE 3 format, I think it's got something to do with that. 

I am kind of hitting a roadblock here, hence, any help would be appreciated, as I didn't want to go back to Beagle 3. 

Thanks, 

Aritra

ADD REPLYlink written 21 months ago by aritra9020

Have you contacted the Beagle authors? There's obviously something wonky in the nSamples approximation.


At this point I'd go through both of your files (the 1000 Genomes files that work and your converted ones that don't) and check for any difference that may trip Beagle up, which the Beagle people may help you better with. After all, they're interested in having their software work with files generated by software such as common as Plink or TASSEL.

ADD REPLYlink written 21 months ago by Philipp Bayer4.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 517 users visited in the last hour