Beagle: Skip intervals with no common markers
2
1
Entering edit mode
2.2 years ago
NS ▴ 10

Hi I am using Beagle to perform genotype imputation. I first used conform-gt to adjust genomic position, allele order and chr strand of the markers in my vcf.gz data to match the reference panel. Then I ran this command with Beagle to perform imputation per chromosome:

 java -Xmx50g -jar beagle.25Nov19.28d.jar gt=chr1.vcf.gz out=imputed_b37_imputed ref=chr1.1kg.phase3.v5a.b37.bref3 map=plink.chr1.GRCh37.map chrom=1 impute=true

After several hours of running, I get the following error:

ERROR: Reference and target files have no markers in common in interval: 
       1:165113264-205459274

Common markers must have identical CHROM, POS, REF, and ALT fields.
Exiting program.

How can I skip the intervals with no common markers and proceed with imputation, without exiting the program ?

beagle plink vcf conformgt • 1.6k views
ADD COMMENT
0
Entering edit mode

Did you ever find a solution for this? I'm running into a similar issue.

ADD REPLY
0
Entering edit mode

Please try with impute=false with subset at the failed coordinates and post the output/error here.

ADD REPLY
1
Entering edit mode
10 months ago
Jack ▴ 20

Hi so I had the same issue. The problem is that your phasing window size is too small compared to the average spacing of the markers in your input genotypes data set. Beagle estimates haplotypes across windows, or intervals of the genome, and if this window size is too small there will be some windows created during runtime with no common markers in them at all (common markers being markers present in both your dataset and the reference panel you are using). The fix is simple: simply re-run your code but this time increase the window size by setting beagle's window parameter: window=[positive float]. The default window size is 40.0. (Beagle's window size parameter is not measured in units of base pair.) An example call is:

        beagle \
            ref= chr20.referencePanel.vcf.gz\
            map=plink.GRCh37.map \
            gt=chr20.inputGenotypes.vcf.gz \
            chrom=2 \
            nthreads=20 \
            window=100.0 \

Be careful though. As you increase the window size, the runtime memory that beagle needs to perform the imputation will increase. This makes sense since, as we chose bigger windows, each window will include more snvs from the reference panel and so the haplotype estimation across that window becomes more computationally expensive. You need to strike a balance between window size and memory allocation.

ADD COMMENT
0
Entering edit mode

sorry I forgot to clarify, in the example beagle is an alias for java -Xmx50g -jar beagle.25Nov19.28d.jar

ADD REPLY
0
Entering edit mode
12 months ago
monkeyrota • 0

yes... pls , I have the same problem :(

ADD COMMENT

Login before adding your answer.

Traffic: 2496 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6