Beagple 5.2 phasing error
0
0
Entering edit mode
5 weeks ago
pmc.sa ▴ 10

Hi everyone,

I'm trying to phase a multi-sample (12 samples) vcf file with the first chromosome. I got this vcf after pruning with plink and recode it back to vcf. The file looks like this:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLES

1 112 . C T . . PR GT ./. 0/1 ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. 1 170 . T G . . PR GT 0/0 0/1 ./. 0/1 ./. 0/1 ./. 0/1 ./. 0/1 ./. ./. 1 370 . G A . . PR GT ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. 0/1 ./. 1 482 . T C . . PR GT ./. 0/1 ./. ./. ./. 0/1 ./. 0/1 ./. 0/1 ./. ./. 1 555 . C G . . PR GT ./. ./. ./. 0/1 ./. ./. ./. 0/1 0/1 ./. ./. ./. 1 1268 . G A . . PR GT ./. ./. ./. 0/1 0/0 0/1 ./. 0/1 ./. 0/1 ./. ./. 1 1946 . C G . . PR GT ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. 0/1 ./. 1 3014 . G T . . PR GT ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. 0/1 ./. 1 3392 . G A . . PR GT ./. ./. ./. ./. ./. ./. ./. ./. ./. 0/1 ./. ./. 1 3430 . C T . . PR GT ./. ./. ./. ./. ./. ./. ./. ./. ./. 0/1 0/1 ./. 1 3966 . G A . . PR GT ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. 0/1 ./. 1 3982 . C T . . PR GT ./. 0/0 ./. 0/1 0/0 0/1 ./. 0/1 0/1 0/1 0/0 ./. 1 4036 . A G . . PR GT ./. 0/1 ./. ./. ./. ./. ./. ./. ./. ./. ./. ./.

Now I'm trying to phase this file using beagle 5.2. My comand line looks like this;

java -jar /path-to-file/beagle.21Apr21.304.jar gt=file_pruned.vcf out=file_pruned_beagle_phased iterations=10

But I'm geting an error message that I think it has to do with MAF frequencies. But I don't really know what I'm doing wrong. Any suggestions are welcome!! :)

Exception in thread "main" java.lang.IllegalArgumentException: invalid array at vcf.LowMafRefGTRec.throwArrayError(LowMafRefGTRec.java:149) at vcf.LowMafRefGTRec.checkIndicesAndReturnMajorAllele(LowMafRefGTRec.java:143) at vcf.LowMafRefDiallelicGTRec.<init>(LowMafRefDiallelicGTRec.java:129) at vcf.RefGTRec.hapCodedInstance(RefGTRec.java:113) at phase.Stage2Haps.recs(Stage2Haps.java:167) at phase.Stage2Haps.lambda$stage2Haps$1(Stage2Haps.java:140) at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:271) at java.base/java.util.stream.IntPipeline$1$1.accept(IntPipeline.java:180) at java.base/java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:104) at java.base/java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:699) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) at java.base/java.util.stream.Nodes$CollectorTask.doLeaf(Nodes.java:2191) at java.base/java.util.stream.Nodes$CollectorTask.doLeaf(Nodes.java:2157) at java.base/java.util.stream.AbstractTask.compute(AbstractTask.java:327) at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746) at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) at java.base/java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:408) at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:736) at java.base/java.util.stream.Nodes.collect(Nodes.java:336) at java.base/java.util.stream.ReferencePipeline.evaluateToNode(ReferencePipeline.java:109) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:545) at java.base/java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260) at java.base/java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:517) at phase.Stage2Haps.stage2Haps(Stage2Haps.java:141) at phase.PhaseLS.runStage2(PhaseLS.java:269) at main.Main.phaseStage2Variants(Main.java:209) at main.Main.phaseTarg(Main.java:182) at main.Main.phaseAndImpute(Main.java:171) at main.Main.main(Main.java:126)

Thanks,

Pedro

vcf beagle plink phasing • 281 views
ADD COMMENT
0
Entering edit mode

First - there's no need to prune SNPs when you are phasing - you are just throwing away the LD information that beagle needs to phase your samples. Did someone tell you to prune them? You also aren't likely to get any kind of decent results phasing 12 samples without a reference panel - this could potentially be the cause of the error. Is this data for humans? If so, then you should use something like the 1000 genomes as a reference. Also - wrap your code and output with the code tags - it'll make it much easier to diagnose any problems.

ADD REPLY
0
Entering edit mode

Well, I tried phasing the unprunned data with shapeit and took me around 3 days. Hence the prunning ideia, lowering SNP data by removing redundate SNPs might be faster. These are pig samples so I haven't found a reference file that I can use. If you have any suggestion, I'd apreciate it.

ADD REPLY
0
Entering edit mode

OK. How many samples / SNPs do you have and which version of shapeit did you use? Pruning really harms accuracy so I think you should avoid doing that when possible. Do you have access to an computing cluster?

ADD REPLY
0
Entering edit mode

So we have 12 samples (1 108 008 SNPs). I used shapeit v2.r904 and ran an applet on DNA nexus using a mem3_ssd1_v2_x96 computer

ADD REPLY
0
Entering edit mode

OK. Well without a reference panel of pigs realistically there's no point in trying to phase or impute 12 samples. Perhaps try to access the one avaliable here? https://gsejournal.biomedcentral.com/articles/10.1186/s12711-019-0445-y

ADD REPLY

Login before adding your answer.

Traffic: 1928 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6