unable to impute with beagle after filtering VCF table using vcftools
1
0
Entering edit mode
5 months ago
ziv_attia • 0

For a reason i can't really understand I am not able to impute a vcf with beagle only after filtering it with vcftools. the filtering is really stright forward and i have used it hundreds of times. this is how I filter the vcf

vcftools --gzvcf ${path}${file}.vcf.gz --remove-indels --max-missing ${maxM} --maf${maf} --minQ ${minQ} --out${path}${file}_filterd_minQ${minQ}_maxM${maxM}_maf${maf} --recode #--recode-INFO-all


and this is how I run beagle:

java -Xmx144g -jar /home/pogoda/software/BEAGLE/beagle.03Jul19.b33.jar gt=${file} nthreads=36 out=IMPUTED_${file}


I can impute the unfiltered table with no problems so it must be something with the filtering. any idea what can be the issue with vcftools?

genomics bioingormatics vcftools beagle • 654 views
1
Entering edit mode

Is there an error message from beagle?

0
Entering edit mode

this the error i get

ERROR: genotype is missing allele separator:

0
Entering edit mode

Okay, what does an example line from your vcf file look like then? It seems to suggest you're missing a field

0
Entering edit mode

What do you mean that you are 'unable to do it'?

0
Entering edit mode

it will crush immediately after i start running it

0
Entering edit mode
3 months ago
roselaw27 • 0

If your error message is something similar like this:

java.lang.IllegalArgumentException: ERROR: inconsistent number of alleles for sample LH05 at marker [1 1088185 . A G]

then we've run into the same problem. I realized that the vcftools filtering is omitting genotype information, that's why beagle can't recognize the alleles. To be more specific, I extracted the line of chr1, 223216, and my diploid sample LH05 had

.:0,0:.:.:0|1:1088185_A_G:.:1088185

where the others (the normal ones) had something like:

0|1:2,4:6:72:0|1:1088185_A_G:162,0,72:1088185

0/0:14,0:14:42:.:.:0,42,390:.

./.:1,2:3:.:.:.:0,0,0:. (the first item separated by : is the genotype info, should be two of them because I have diploid samples)

I checked my files and found it happened as a single 1 as well.

The reason is I used vcftools filter (maf) to process results from GATK VariantFiltration step. This is actually not my first time discovered this problem with vcftools (last time I used --min-alleles and --max-alleles). That's why your un-vcftools-filtered vcf runs smoothly with beagle. I don't understand why other software never caught this error, probably because they regard .as ./. or 0/0 and continued anyway. This could be a problem if your statistic is sensitive to missing alleles.

Anyway, if people are using vcftools for filtering, PLEASE CHECK your results.

0
Entering edit mode

my beagle is beagle.25Nov19.28d.jar

0
Entering edit mode

I found same error during the usage of recode.vcf fie. Even I tried to change . to ./. then recide again it showed same error. I used gt=filename out=imputed.vcf. Kindly guide me to put the paramerters for filtering minor allele frequency, -minQ. Any otherwaay!