BEAGLE 4.0 with ped file output into Refined IBD not identifying segments properly
0
0
Entering edit mode
3.1 years ago
LTDavid ▴ 20

I'm running BEAGLE 4.0 with the ped argument because I would like to use parent-offspring data for phasing. When I run the command, it produces errors about inconsistent trio genotype set to missing or skipping VCF record with no data. When I run Refined IBD from that output, the results do not seem to have segment count or length consistent with the number of parent-offspring trios/duos in the data. The chromosome number assigned to the segment and length of segments does not match the results of other known programs using the same data. Please assist with where I may be going wrong.

For BEAGLE 4.0, I use the following:

java -Xmx800m -jar beagle.r1399.jar gt=mergedDataFiles.vcf.gz gprobs=false ped=pedigree.ped out=BEAGLE4ped.gt


For Refined IBD, I use the following:

java -Xmx800m -jar refined-ibd.16May19.ad5.jar gt=BEAGLE4ped.gt.vcf.gz lod=4.0 length=2.0 out=RefinedIBDwB4ped


When I run BEAGLE 4.0, the output file looks like this (for several samples, listed alphabetical by name):

CHROM POS ID REF ALT QUAL  FILTER   INFO  FORMAT   person2 father2 person4 person5 person7 person6 person9 person3 father3 father1
1   82154   rs4477212   A   C,G,T   .   PASS    .   GT  3|3 3|3 3|3 3|3 3|3 3|3 3|3 3|3 3|3 3|3
1   752721  rs3131972   A   G   .   PASS    .   GT  0|1 0|1 0|1 0|0 0|0 0|1 0|0 0|0 0|0 0|1
1   768448  rs12562034  G   A   .   PASS    .   GT  0|1 0|0 1|0 0|1 0|1 0|0 0|0 0|1 1|1 0|0
1   776546  rs12124819  A   G   .   PASS    .   GT  0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0


BEAGLE 4.0 output also produces several of the following errors:

WARNING: Inconsistent trio genotype set to missing  1   1211292 rs6685064   C   T   father1 mother1 person1
WARNING: Inconsistent trio genotype set to missing  1   1213305 rs28488099  C   A   father1 mother1 person1
Skipping VCF record with no data: 1 1213305 rs28488099  C   A


My pedigree file looks like this (Does the order matter compared to the BEAGLE gt=file?). It has a mostly trios with a few duos and singles:

001 father1 0   0   1   0
001 mother1 0   0   2   0
001 person1 father1 mother1 2   0
002 father2 0   0   1   0
002 mother2 0   0   2   0
002 person2 father2 mother2 1   0
003 father3 0   0   1   0
003 person3 father3 0   1   0
004 person4 0   0   1   0
005 person5 0   0   1   0
006 person6 0   0   2   0


RefinedIBD output (249 lines total) is almost exclusively for family 001, barely including any other matches from other families or singles. Even when it does list segments, the length in cM does not appear consistent with some of the larger parent-offspring segment sizes. The maximum length size tends to hover around 8 or 9 cM with just a few at most around 15 cM. The exception is for chromosome x section which manages to include a few more of the other samples:

person1 1   mother1 1   1   27662666    29766120    7.38    2.103
father1 2   person1 2   1   26924308    29766120    10.96   2.842
person1 2   mother1 1   1   59107914    61287929    13.53   2.18
father1 1   person1 2   1   48016511    51061184    14.72   3.045
person1 2   mother1 2   1   71770366    76536649    17.33   4.766


beagle phasing pedigree software error refined ibd • 1.4k views
0
Entering edit mode

I used plantimals/2vcf to convert from AncestryDNA zip files to vcf files. Some of the markers didn't transfer over to the output file. Those that did are correct. 2vcf uses GRCh37 as reference.vcf file.