Beagle: Index -l out of bounds for length 2
0
0
Entering edit mode
2.1 years ago
Tom ▴ 40

I want to impute missing genotypic data. I downloaded beagle from here, and I ran it on a test file below:

##fileformat=VCFv4.1
##medaka_version=1.0.3
##contig=<ID=chr1>
##INFO=<ID=pos1,Number=.,Type=Integer,Description="POS of incorporated variants from haplotype 1">
##INFO=<ID=q1,Number=1,Type=Float,Description="Combined qual score for haplotype 1">
##INFO=<ID=pos2,Number=.,Type=Integer,Description="POS of incorporated variants from haplotype 2">
##INFO=<ID=q2,Number=1,Type=Float,Description="Combined qual score for haplotype 2">
##FORMAT=<ID=GT,Number=G,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=G,Type=Integer,Description="Genotype quality score">
##CL=medaka_variant -U -o chr1 -m r941_prom_variant_g360 -s r941_prom_snp_g360 -i PAD65442_3.6.1_pass.bam -f GCA_000001405.15_GRCh38_no_alt_analysis_set.fna -r chr1:0-10000000 -t 4; Fri  3 Jul 21:15:23 BST 2020
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  SAMPLE1 SAMPLE2 SAMPLE3 SAMPLE4 SAMPLE5
chr1    10108   .   C   CT  14.91   PASS    pos1=10108;pos2=10108;q1=10.99;q2=18.83 GT:GQ   1|1:15  1|1:15  1|1:15  1|1:15  1|1:15  
chr1    10177   .   A   AC  4.852   PASS    pos2=10177;q2=4.852 GT:GQ   0|1:5   1|1:15  1|1:15  1|1:13  1|1:16
chr1    10257   .   A   C   0.799   PASS    pos1=10257;q1=0.799 GT:GQ   1/1:1   1|1:15  1|1:15  0|1:15  1|1:15
chr1    10291   .   C   T   8.544   PASS    pos2=10291;q2=8.544 GT:GQ   0|1:9   1|1:15  1|1:15  0/1:12  1|1:15
chr1    10297   .   C   T   8.215   PASS    pos2=10297;q2=8.215 GT:GQ   0|1:8   1|1:15  1|0:15  1|1:14  1|1:16
chr1    10303   .   C   T   0.246   PASS    pos2=10303;q2=0.246 GT:GQ   ./. 1|1:15  1|0:15  1|1:14  1|1:15
chr1    10309   .   C   T   2.7155  PASS    pos1=10309;pos2=10309;q1=1.046;q2=4.385 GT:GQ   1|0:3   0|1:15  1|1:15  1|1:15  1|1:15
chr1    10315   .   C   T   4.8525  PASS    pos1=10315;pos2=10315;q1=3.083;q2=6.622 GT:GQ   1|1:5   0|1:15  1|1:15  1|1:15  1|1:15
chr1    10321   .   C   T   0.562   PASS    pos2=10321;q2=0.562 GT:GQ   0|1:1   1|1:15  1|1:15  0|1:15  1|1:15

And it generated an output no problem.

I then ran it on my real data (which is also a VCF file); where the data looks almost identical to above (with a LOT of header lines at the start, >3500 of them); this is just an example of the structure of the lines, I can't put actual lines in both for confidentiality and each line is >200 entries long:

chr10   11182636    .   AT  A   103.3   PASS    AC=1;AF=2   ./. 0/0:7   0/1:35  ./. 0/1:22

And I get the error attached (sorry that it's an image, I have to work through a remote desktop so I can't copy/paste):

enter image description here

I can't understand how my data is different to the example that it doesn't work. I appreciate it might be difficult to point me in the right direction without seeing the full file, but first there's >3500 header lines (starting with '##') and then the data itself is on a remote desktop so can't be copied/pasted, and also cannot be shared as it's patient data, but if someone had any idea for a direction I could go in, I'd appreciate it.

beagle genotyping • 416 views
ADD COMMENT

Login before adding your answer.

Traffic: 1397 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6