Beagle imputation - duplication position error
2.9 years ago
valopes

Hi all,

So, after a figured how to extract a .vcf from an Illumina data [C: Getting a .vcf file from a Illumina SNPChip results (.bsc file)], now I am facing problems to filter and do the imputation.

As I never did that before, I've been trying a lot of options but with no success. So, let me explain it... It is a long story...

From a .vcf file, I used the command to filter:

vcftools --vcf input.vcf --remove-filtered-all --max-missing 0.2 --maf 0.05 --mac 1 --min-alleles 2 --max-alleles 2 --recode --out output_filtered.vcf


After that, I've tried to do the imputation using Beagle (beagle.16May18.771.jar):

java -Xmx25000m -jar beagle.16May18.771.jar gt=input.vcf out=output_imputed


but I got an error:

No genetic map is specified: using 1 cM = 1 Mb Exception in thread "main" java.lang.IllegalArgumentException: Duplicate marker:
0 1556719 Gm20_1556719_C_T G A at vcf.Markers.markerSet(Markers.java:175) at vcf.Markers.<init>(Markers.java:92) at vcf.Markers.create(Markers.java:69) at vcf.TargetData.extractMarkers(TargetData.java:130) at vcf.TargetData.advanceWindowCm(TargetData.java:120) at vcf.TargetData.targetData(TargetData.java:76) at main.Main.data(Main.java:143) at main.Main.main(Main.java:115)

So, I thought that I should create a .map file for the filtered .vcf, using PLINK:

plink --vcf input_filtered.vcf --recode --out output_plink_files


Then, I've run the Beagle again:

java -Xmx25000m -jar beagle.16May18.771.jar gt=input.vcf map=input_vcf.map out=output_imputed


and I've got:

Exception in thread "main" java.lang.IllegalArgumentException: duplication posit ion: 0 Gm20_1556719_C_T 0

Well, reading I could see that the genetic map is not the problem but I cannot figure the duplication posit out. The thing is, I am quite lost here. Could someone help me?

Oh I also found this post [Can someone help me with imputation of missing SNPs using beagle 4?] But still didn't work for me...

Could you search in the vcf for that position Gm20_1556719_C_T, for example using grep? I'm not sure how the SNP and chromosome identifiers are in your vcf file, you may have to search for 1556719 separately.

Yes, I did seach already! And it looks duplicate...

Line 1762:
0   1556719 Gm05_1556719_C_T    G   A   .   .   .   GT  0/0 0/0 0/0 1/1 1/1 0/0
Line 1763:
0   1556719 Gm20_1556719_C_T    G   A   .   .   .   GT  0/1 0/0 0/1 0/1 0/0 0/1


and I know this position is not the only one duplicate.

I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

It looked like you pasted the same twice, is the output correct as I formatted it?

2.9 years ago

plink’s —list-duplicate-vars flag was created specifically to address this Beagle 4 issue.

0
Entering edit mode

Okay I've tried this:

 --list-duplicate-vars


and then

 --exclude


It didn't work.

So, I did:

 --write-snplist

cat input.snplist | sort | uniq -d > output_new.snplist

--exclude


It still not working...

Exception in thread "main" java.lang.IllegalArgumentException: Duplicate marker: 0 1556719 Gm20_1556719_C_T G A at vcf.Markers.markerSet(Markers.java:175) at vcf.Markers.<init>(Markers.java:92) at vcf.Markers.create(Markers.java:69) at vcf.TargetData.extractMarkers(TargetData.java:130) at vcf.TargetData.advanceWindowCm(TargetData.java:120) at vcf.TargetData.targetData(TargetData.java:76) at main.Main.data(Main.java:143) at main.Main.main(Main.java:115)

8 weeks ago

Hi! I have the same issue as you, can I ask how did you managed to solve it? Many thanks!!

This is how it looks for me: Exception in thread "main" java.lang.IllegalArgumentException: Duplicate marker: 1 59409838 ARS-USMARC-Parent-DQ404150-rs29012530_dup T C at vcf.Markers.markerSet(Markers.java:131) at vcf.Markers.<init>(Markers.java:85) at vcf.Markers.create(Markers.java:64) at vcf.BasicGT.markers(BasicGT.java:105) at vcf.BasicGT.<init>(BasicGT.java:86) at vcf.TargetData.targGT(TargetData.java:92) at vcf.TargetData.advanceWindowCm(TargetData.java:120) at main.Main.phaseData(Main.java:158) at main.Main.main(Main.java:113)

Did you try to use plink --list-duplicate-vars followed by --exclude? This should work if you use them correctly.