Imputation with BEAGLE 5.1 giving an inconsistent number of alleles error
1
1
Entering edit mode
2.2 years ago
User000 ▴ 550

Hello,

I did a variant calling of 200 genotypes with freebayes. I filtered for the DP and GQ values and the genotypes that did not pass the filter were set to ./.. I now want to impute these filtered vcf files with BEAGLE v5.1. But it is giving me the following error:

java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: ERROR: inconsistent number of alleles for sample Sample_1469 at marker [chr4A   305381905   .   G   A]

What could be the problem? I had a look at the position and it looks like this: .:.:.:.:.:.:.:.:. This is missing data. Could it be the reason? If yes, how could I deal with this?

freebayes beagle • 2.3k views
ADD COMMENT
0
Entering edit mode
2.2 years ago
gubrins ▴ 220

Heys, I'm in the same situation as you, did you solve it? For me is not missing data, as I don't have the pattern you have. If not, let's see if somebody can help us!

ADD COMMENT
1
Entering edit mode

Hey, yes at the end I changed all . in missing data ./.. In my case the . is really a missing data, while ./. is the missing genotype after I filtered the vcf. I found an answer how to change here on BioStars, but I cannot find the thread to give it credits:

zcat vcf.gz | perl -pe "s/\s\.:/\t.\/.:/g" | bgzip -c > out.vcf
ADD REPLY
1
Entering edit mode

Could you paste the pattern that you get?

ADD REPLY
0
Entering edit mode

Thank you very much for your answer! I've seen that we are quite a lot of users with doubts about Beagle but not a lot of people answering them, so your help is really appreacite it right now! I'm going to try right what you did, but just in case, here is my error message:

java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: ERROR: inconsistent number of alleles for sample unknown at marker [NC_041312.1  1098286 .       T       C]

As you can see, is similar to the one you got (NC_041312.1 is one of my chromosomes)

And here is the observation itself: NC_041312.1 1098286 . T C 65.76 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=2;CIGAR=1X;DP=2;DPB=2;DPRA=0;EPP=7.35324;EPPR=0;GTI=0;LEN=1;MEANALT=1; MQM=60;MQMR=0;NS=1;NUMALT=1;ODDS=7.37776;PAIRED=1;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=82;QR=0; RO=0;RPL=1;RPP=3.0103;RPPR=0;RPR=1;RUN=1;SAF=1;SAP=3.0103;SAR=1;SF=1;SRF=0;SRP=0;SRR=0;TYPE=snp GT:QA:RO:AO:AD:DP:GL:QR . 1/1:82:0:2:0,2:2:-7.77968,-0.60206,0:0

Let's see if you can help me, I hit a wall...

ADD REPLY
1
Entering edit mode

My solution will not help in your case, since it is simply replacing the . with ./..Why is the name of your sample is unknown? How many samples you have in your vcf file? Also is there a .between QR . 1/1?

ADD REPLY
0
Entering edit mode

I don't know why when I merge the different vcf files, the first one is called always unknown. I was just trying with a total of 2 vcf files this time. And yes, it seems there is a . between them, do you think that could be the problem?

ADD REPLY
1
Entering edit mode

Does you bam files have read groups? In my opinion this error is something to do with the previous SNP calling step...I do not think it is ok to have an unknown sample name

ADD REPLY
0
Entering edit mode

I was following this post and anything appeared, so I imagine I don't have them. https://gatkforums.broadinstitute.org/gatk/discussion/6472/read-groups

bwa mem -M -t 10 /mnt/CIBIO/homes/gabri.mochales/data/OXPHOS_run2/align/prueba/genome/index.fna /mnt/CIBIO/homes/gabri.mochales/data/OXPHOS_run2/RAPiD-Genomics_HL5T3BBXX_POR_100801_P01_WA01_i5-505_i7-59_S97_L001_R1_001.fastq.gz /mnt/CIBIO/homes/gabri.mochales/data/OXPHOS_run2/RAPiD-Genomics_HL5T3BBXX_POR_100801_P01_WA01_i5-505_i7-59_S97_L001_R2_001.fastq.gz | samtools sort -o /mnt/CIBIO/homes/gabri.mochales/data/OXPHOS_run2/results_alignment/RAPiD-Genomics_HL5T3BBXX_POR_100801_P01_WA01_i5-505_i7-59_S97_L001_.bam

where index is my genome, the paired end fastq files and the output. How can I create my read groups?

ADD REPLY
1
Entering edit mode

could you please describe all the steps you are using to do variant calling? For example, I am following this freebayes protocol.

ADD REPLY
0
Entering edit mode

I followed that or another similar link:

freebayes -f index.fna /mnt/CIBIO/homes/gabri.mochales/data/OXPHOS_run2/results_gz/RAPiD-Genomics_HL5T3BBXX_POR_100801_P01_WA01_i5-505_i7-59_S97_L001_.bam > /mnt/CIBIO/homes/gabri.mochales/ecoli_SNP_calling/results_SNP_calling/RAPiD-Genomics_HL5T3BBXX_POR_100801_P01_WA01_i5-505_i7-59_S97_L001_.vcf

Is quite straightforward, let's see if it can help you

ADD REPLY
0
Entering edit mode

Heys again, When I do this: java -jar picard.jar ValidateSamFile \ I=input.bam \ MODE=SUMMARY

I get all these errors, also they warn me that there is a missing read group:

Error Type Count ERROR:INVALID_FLAG_FIRST_OF_PAIR 23510 ERROR:INVALID_FLAG_MATE_UNMAPPED 5459 ERROR:INVALID_FLAG_SECOND_OF_PAIR 17307 ERROR:MISSING_READ_GROUP 1 WARNING:RECORD_MISSING_READ_GROUP 797715

ADD REPLY
1
Entering edit mode

I can only suggest to use the best practice for every single method you use (freebayes, GATK or bcftools/samtools etc) and follow all the steps. For now for me it is very confusing to understand what you are doing. For sure read groups are missing, that is why you have unknown sample. In case of problems create and ask a new question here on the forum, I am sure you will find a solution.

ADD REPLY
0
Entering edit mode

Just for the information of everybody, rather than using samtools + bwa + freebayes, I did everything following GATK and the phasing is working!

ADD REPLY
1
Entering edit mode

If you do everything in a right way, it will also work with freebayes as it did for me

ADD REPLY

Login before adding your answer.

Traffic: 1512 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6