Beagple 5.2 phasing error
0
2
Entering edit mode
2.9 years ago
pmc.sa ▴ 40

Hi everyone,

I'm trying to phase a multi-sample (12 samples) vcf file with the first chromosome. I got this vcf after pruning with plink and recode it back to vcf. The file looks like this:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLES

1 112 . C T . . PR GT ./. 0/1 ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. 1 170 . T G . . PR GT 0/0 0/1 ./. 0/1 ./. 0/1 ./. 0/1 ./. 0/1 ./. ./. 1 370 . G A . . PR GT ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. 0/1 ./. 1 482 . T C . . PR GT ./. 0/1 ./. ./. ./. 0/1 ./. 0/1 ./. 0/1 ./. ./. 1 555 . C G . . PR GT ./. ./. ./. 0/1 ./. ./. ./. 0/1 0/1 ./. ./. ./. 1 1268 . G A . . PR GT ./. ./. ./. 0/1 0/0 0/1 ./. 0/1 ./. 0/1 ./. ./. 1 1946 . C G . . PR GT ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. 0/1 ./. 1 3014 . G T . . PR GT ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. 0/1 ./. 1 3392 . G A . . PR GT ./. ./. ./. ./. ./. ./. ./. ./. ./. 0/1 ./. ./. 1 3430 . C T . . PR GT ./. ./. ./. ./. ./. ./. ./. ./. ./. 0/1 0/1 ./. 1 3966 . G A . . PR GT ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. 0/1 ./. 1 3982 . C T . . PR GT ./. 0/0 ./. 0/1 0/0 0/1 ./. 0/1 0/1 0/1 0/0 ./. 1 4036 . A G . . PR GT ./. 0/1 ./. ./. ./. ./. ./. ./. ./. ./. ./. ./.

Now I'm trying to phase this file using beagle 5.2. My comand line looks like this;

java -jar /path-to-file/beagle.21Apr21.304.jar gt=file_pruned.vcf out=file_pruned_beagle_phased iterations=10

But I'm geting an error message that I think it has to do with MAF frequencies. But I don't really know what I'm doing wrong. Any suggestions are welcome!! :)

Exception in thread "main" java.lang.IllegalArgumentException: invalid array at vcf.LowMafRefGTRec.throwArrayError(LowMafRefGTRec.java:149) at vcf.LowMafRefGTRec.checkIndicesAndReturnMajorAllele(LowMafRefGTRec.java:143) at vcf.LowMafRefDiallelicGTRec.<init>(LowMafRefDiallelicGTRec.java:129) at vcf.RefGTRec.hapCodedInstance(RefGTRec.java:113) at phase.Stage2Haps.recs(Stage2Haps.java:167) at phase.Stage2Haps.lambda$stage2Haps$1(Stage2Haps.java:140) at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:271) at java.base/java.util.stream.IntPipeline$1$1.accept(IntPipeline.java:180) at java.base/java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:104) at java.base/java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:699) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) at java.base/java.util.stream.Nodes$CollectorTask.doLeaf(Nodes.java:2191) at java.base/java.util.stream.Nodes$CollectorTask.doLeaf(Nodes.java:2157) at java.base/java.util.stream.AbstractTask.compute(AbstractTask.java:327) at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746) at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) at java.base/java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:408) at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:736) at java.base/java.util.stream.Nodes.collect(Nodes.java:336) at java.base/java.util.stream.ReferencePipeline.evaluateToNode(ReferencePipeline.java:109) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:545) at java.base/java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260) at java.base/java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:517) at phase.Stage2Haps.stage2Haps(Stage2Haps.java:141) at phase.PhaseLS.runStage2(PhaseLS.java:269) at main.Main.phaseStage2Variants(Main.java:209) at main.Main.phaseTarg(Main.java:182) at main.Main.phaseAndImpute(Main.java:171) at main.Main.main(Main.java:126)

Thanks,

Pedro

vcf beagle plink phasing • 2.4k views
ADD COMMENT
0
Entering edit mode

First - there's no need to prune SNPs when you are phasing - you are just throwing away the LD information that beagle needs to phase your samples. Did someone tell you to prune them? You also aren't likely to get any kind of decent results phasing 12 samples without a reference panel - this could potentially be the cause of the error. Is this data for humans? If so, then you should use something like the 1000 genomes as a reference. Also - wrap your code and output with the code tags - it'll make it much easier to diagnose any problems.

ADD REPLY
0
Entering edit mode

Well, I tried phasing the unprunned data with shapeit and took me around 3 days. Hence the prunning ideia, lowering SNP data by removing redundate SNPs might be faster. These are pig samples so I haven't found a reference file that I can use. If you have any suggestion, I'd apreciate it.

ADD REPLY
0
Entering edit mode

OK. How many samples / SNPs do you have and which version of shapeit did you use? Pruning really harms accuracy so I think you should avoid doing that when possible. Do you have access to an computing cluster?

ADD REPLY
0
Entering edit mode

So we have 12 samples (1 108 008 SNPs). I used shapeit v2.r904 and ran an applet on DNA nexus using a mem3_ssd1_v2_x96 computer

ADD REPLY
0
Entering edit mode

OK. Well without a reference panel of pigs realistically there's no point in trying to phase or impute 12 samples. Perhaps try to access the one avaliable here? https://gsejournal.biomedcentral.com/articles/10.1186/s12711-019-0445-y

ADD REPLY
0
Entering edit mode

Did you end up finding a solution? I am in a similar situation with 9 samples and no reference panel.

ADD REPLY
1
Entering edit mode

Hi @mglasena, so I had the same problem over and over again for a while. I thought I had to pool samples from the same populations. In fact, I had about 80 samples from 5 different populations. I created a VCF file with all samples (BCFTools merge), and I tried with that file, but still had the same problem. So we thought it might be a problem with our Linux server. We downloaded our data to a private server and ran beagle, and it worked fine. Our problem was definitely due to incompatibilities between beagle and our server. Hope it helps solving your problem!

ADD REPLY
0
Entering edit mode

You can't use Beagle with that few samples and no reference panel. It won;t give meaningful results.

ADD REPLY
0
Entering edit mode

How did you determine this?

ADD REPLY
1
Entering edit mode

Just a bit of experience working with these kind of software and speaking with the authors. I would strongly advise not to procede unless you have a reference panel or more individuals.

ADD REPLY

Login before adding your answer.

Traffic: 3008 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6