Issues Phasing with Eagle v2.4
2
0
Entering edit mode
3 months ago
Francisco • 0

I'm trying to phase a short vcf file of a SINGLE individual. It has exome data from 6 genes in 4 chromosomes. In total the vcf has 30 variants. The source is the Torrent Variant Caller. In the Fixed section the chromosomes are included as "chrN". The VCF has been referenced with hg19.

./eagle \
--vcf ./C11.vcf \
--chrom 22 \
--geneticMapFile ./tables/genetic_map_hg19_withX.txt.gz \
--outPrefix ./tmp/prueba_phased


Even though the software has some issues to recognize the chro header, I believe it starts with no problem:

=== Reading genotype data ===
Reading genotypes for N = 1 samples
[W::vcf_parse] Contig 'chr7' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] Contig 'chr10' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] Contig 'chr15' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] Contig 'chr22' is not defined in the header. (Quick workaround: index the file with tabix.)
Filling in genetic map coordinates using reference file:
./tables/genetic_map_hg19_withX.txt.gz
Physical distance range: 3967 base pairs
Genetic distance range:  0.000207245 cM
Average # SNPs per cM:   53077
Auto-selecting --maxBlockLen: 0.25 cM
Number of <=(64-SNP, 0.25cM) segments: 1
Average # SNPs per segment: 11
Fraction of heterozygous genotypes: 0
SKIPPED STEP 1
SKIPPED STEP 2
BEGINNING STEP 3 (PBWT ITERS)


Then it starts to phase the sample trhogu 1 to 10 batches:

BATCH 1 OF 10
Phasing samples 1-0
Time for phasing batch: 3.09944e-06
BATCH 2 OF 10
Phasing samples 1-0
Time for phasing batch: 691414e-06


When it gets to the tenth batch it changes and I get the error:

BATCH 10 OF 10
Phasing samples 1-1
WARNING: Sample 1 (1-indexed) has a het count of 0
ERROR: Failed to allocate 18446744073709551596 bytes


I don't understand why I get this error if I just have 11 variants in these gene and why it says that sample 1 has a het count of 0. Below you can see the Fixed section for the Chr 22:

CHROM   POS      ID REF  ALT  QUAL    FILTER
chr22  42522613  NA  G    C    100.0   PASS
chr22  42523409  NA  G    T    100.0   PASS
chr22  42523943  NA  A    G    100.0   PASS
chr22  42525952  NA  C    A    100.0   PASS
chr22  42526484  NA  A    C    100.0   PASS
chr22  42526549  NA  C    T    100.0   PASS
chr22  42526561  NA  GG   TC   100.0   PASS
chr22  42526567  NA  G    A    100.0   PASS
chr22  42526571  NA  C    G    100.0   PASS
chr22  42526573  NA  T    G    100.0   PASS


I hope you can help me to fix this error. Thanks.

h19 vcf phasing eaglev2.4 haplotype • 549 views
1
Entering edit mode
3 months ago
4galaxy77 ★ 1.6k

I don't know what that error means, but you absolutely cannot phase a single individual without a reference panel. You also need to split up the file into individual chromosomes.

I would recommend using the 1000 genomes as a reference for simplicity.

0
Entering edit mode

Thank you for your input !Yes! Im aware of that. And, as you can see I added the reference panel: --geneticMapFile ./tables/genetic_map_hg19_withX.txt.gz \ I also split by chromosome, I specifically asked for the chromosome 22 as you can see in the first script.

1
Entering edit mode

The argument --geneticMapFile specifies a genetic map, which gives the recombination rates between each SNP. This is different to a reference panel of haplotypes used for the phasing. You gave given the first but not the second. You need to download something like the 1000 genomes and use it as a reference,

0
Entering edit mode

I see, you are right. I need to include the --vcfRef argument. Thanks.

0
Entering edit mode

I'm still having a problem with the reference genome... it seems that my variants are not present in the reference. But does not make any sense since I'm using a b37 reference genome in a h19 built target vcf.

0
Entering edit mode
3 months ago
Francisco • 0

OK. So the issue was solved by adding a reference genome:

./eagle \
--vcfTarget ./C11_v1.vcf.gz \
--chrom 22 \
--vcfRef ./chr22.1kg.phase3.v5a.vcf.gz \
--geneticMapFile ./tables/genetic_map_hg19_withX.txt.gz \
--outPrefix ./prueba_phased


I obtained the reference genome from: http://bochet.gcc.biostat.washington.edu/beagle/1000_Genomes_phase3_v5a/b37.vcf/

Well this was enough since I found an issue between the reference and the target:

SNPs ignored: 11 SNPs in target but not reference
--> WARNING: Check REF/ALT agreement between target and ref? <--
424147 SNPs in reference but not target
0 multi-allelic SNPs in target


So it seems that the reference and the target genome are not in agreetment. However I do not understand why If I'm using a target vcf built in h19, a genetic map built also in b19 and a reference genome also in b37. Does anyone know why?