Question

Beagle: Skip intervals with no common markers

1

Entering edit mode

5.4 years ago

NS ▴ 10

Hi I am using Beagle to perform genotype imputation. I first used conform-gt to adjust genomic position, allele order and chr strand of the markers in my vcf.gz data to match the reference panel. Then I ran this command with Beagle to perform imputation per chromosome:

 java -Xmx50g -jar beagle.25Nov19.28d.jar gt=chr1.vcf.gz out=imputed_b37_imputed ref=chr1.1kg.phase3.v5a.b37.bref3 map=plink.chr1.GRCh37.map chrom=1 impute=true

After several hours of running, I get the following error:

ERROR: Reference and target files have no markers in common in interval: 
       1:165113264-205459274

Common markers must have identical CHROM, POS, REF, and ALT fields.
Exiting program.

How can I skip the intervals with no common markers and proceed with imputation, without exiting the program ?

beagle plink vcf conformgt • 4.9k views

ADD COMMENT • link updated 4.1 years ago by Jack ▴ 20 • written 5.4 years ago by NS ▴ 10

0

Entering edit mode

Did you ever find a solution for this? I'm running into a similar issue.

ADD REPLY • link 4.2 years ago by Andhika • 0

0

Entering edit mode

Please try with impute=false with subset at the failed coordinates and post the output/error here.

ADD REPLY • link 4.2 years ago by cpad0112 21k

score 1 · Answer 1 · 2021-06-24

Hi so I had the same issue. The problem is that your phasing window size is too small compared to the average spacing of the markers in your input genotypes data set. Beagle estimates haplotypes across windows, or intervals of the genome, and if this window size is too small there will be some windows created during runtime with no common markers in them at all (common markers being markers present in both your dataset and the reference panel you are using). The fix is simple: simply re-run your code but this time increase the window size by setting beagle's window parameter: window=[positive float]. The default window size is 40.0. (Beagle's window size parameter is not measured in units of base pair.) An example call is:

        beagle \
            ref= chr20.referencePanel.vcf.gz\
            map=plink.GRCh37.map \
            gt=chr20.inputGenotypes.vcf.gz \
            chrom=2 \
            nthreads=20 \
            window=100.0 \

Be careful though. As you increase the window size, the runtime memory that beagle needs to perform the imputation will increase. This makes sense since, as we chose bigger windows, each window will include more snvs from the reference panel and so the haplotype estimation across that window becomes more computationally expensive. You need to strike a balance between window size and memory allocation.

score 0 · Answer 2 · 2021-05-18

0

Entering edit mode

4.2 years ago

monkeyrota • 0

yes... pls , I have the same problem :(

ADD COMMENT • link 4.2 years ago by monkeyrota • 0