Issues Phasing with Eagle v2.4
2
0
Entering edit mode
2.2 years ago
Francisco • 0

I'm trying to phase a short vcf file of a SINGLE individual. It has exome data from 6 genes in 4 chromosomes. In total the vcf has 30 variants. The source is the Torrent Variant Caller. In the Fixed section the chromosomes are included as "chrN". The VCF has been referenced with hg19.

./eagle \
--vcf ./C11.vcf \
--chrom 22 \
--geneticMapFile ./tables/genetic_map_hg19_withX.txt.gz \
--outPrefix ./tmp/prueba_phased

Even though the software has some issues to recognize the chro header, I believe it starts with no problem:

=== Reading genotype data ===      
Reading genotypes for N = 1 samples
[W::vcf_parse] Contig 'chr7' is not defined in the header. (Quick workaround: index the file with tabix.) 
[W::vcf_parse] Contig 'chr10' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] Contig 'chr15' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] Contig 'chr22' is not defined in the header. (Quick workaround: index the file with tabix.)
Read M = 11 variants
Filling in genetic map coordinates using reference file:
./tables/genetic_map_hg19_withX.txt.gz
Physical distance range: 3967 base pairs
Genetic distance range:  0.000207245 cM
Average # SNPs per cM:   53077
Auto-selecting --maxBlockLen: 0.25 cM
Number of <=(64-SNP, 0.25cM) segments: 1
Average # SNPs per segment: 11
Fraction of heterozygous genotypes: 0
SKIPPED STEP 1
SKIPPED STEP 2
BEGINNING STEP 3 (PBWT ITERS)

Then it starts to phase the sample trhogu 1 to 10 batches:

BATCH 1 OF 10
Phasing samples 1-0
Time for phasing batch: 3.09944e-06
BATCH 2 OF 10
Phasing samples 1-0
Time for phasing batch: 691414e-06

When it gets to the tenth batch it changes and I get the error:

BATCH 10 OF 10
Phasing samples 1-1
WARNING: Sample 1 (1-indexed) has a het count of 0
ERROR: Failed to allocate 18446744073709551596 bytes

I don't understand why I get this error if I just have 11 variants in these gene and why it says that sample 1 has a het count of 0. Below you can see the Fixed section for the Chr 22:

CHROM   POS      ID REF  ALT  QUAL    FILTER
chr22  42522613  NA  G    C    100.0   PASS
chr22  42523409  NA  G    T    100.0   PASS 
chr22  42523943  NA  A    G    100.0   PASS 
chr22  42525952  NA  C    A    100.0   PASS 
chr22  42526484  NA  A    C    100.0   PASS 
chr22  42526549  NA  C    T    100.0   PASS 
chr22  42526561  NA  GG   TC   100.0   PASS
chr22  42526567  NA  G    A    100.0   PASS
chr22  42526571  NA  C    G    100.0   PASS
chr22  42526573  NA  T    G    100.0   PASS

I hope you can help me to fix this error. Thanks.

h19 vcf phasing eaglev2.4 haplotype • 2.5k views
ADD COMMENT
1
Entering edit mode
2.2 years ago
4galaxy77 2.8k

I don't know what that error means, but you absolutely cannot phase a single individual without a reference panel. You also need to split up the file into individual chromosomes.

I would recommend using the 1000 genomes as a reference for simplicity.

ADD COMMENT
0
Entering edit mode

Thank you for your input !Yes! Im aware of that. And, as you can see I added the reference panel: --geneticMapFile ./tables/genetic_map_hg19_withX.txt.gz \ I also split by chromosome, I specifically asked for the chromosome 22 as you can see in the first script.

ADD REPLY
1
Entering edit mode

The argument --geneticMapFile specifies a genetic map, which gives the recombination rates between each SNP. This is different to a reference panel of haplotypes used for the phasing. You gave given the first but not the second. You need to download something like the 1000 genomes and use it as a reference,

ADD REPLY
0
Entering edit mode

I see, you are right. I need to include the --vcfRef argument. Thanks.

ADD REPLY
0
Entering edit mode

I'm still having a problem with the reference genome... it seems that my variants are not present in the reference. But does not make any sense since I'm using a b37 reference genome in a h19 built target vcf.

ADD REPLY
0
Entering edit mode
2.2 years ago
Francisco • 0

OK. So the issue was solved by adding a reference genome:

./eagle \
    --vcfTarget ./C11_v1.vcf.gz \
    --chrom 22 \
    --vcfRef ./chr22.1kg.phase3.v5a.vcf.gz \
    --geneticMapFile ./tables/genetic_map_hg19_withX.txt.gz \
    --outPrefix ./prueba_phased

I obtained the reference genome from: http://bochet.gcc.biostat.washington.edu/beagle/1000_Genomes_phase3_v5a/b37.vcf/

Well this was enough since I found an issue between the reference and the target:

SNPs ignored: 11 SNPs in target but not reference
 --> WARNING: Check REF/ALT agreement between target and ref? <--
 424147 SNPs in reference but not target
 0 multi-allelic SNPs in target

So it seems that the reference and the target genome are not in agreetment. However I do not understand why If I'm using a target vcf built in h19, a genetic map built also in b19 and a reference genome also in b37. Does anyone know why?

ADD COMMENT
0
Entering edit mode

Hi,

it is possible that your vcf file has swapped ref and alt alleles. This is often an issue when you convert between plink bed/bim/fam and vcf file formats. You need to fix these or run Eagle with the option --allowRefAltSwap

ADD REPLY

Login before adding your answer.

Traffic: 1882 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6