Question: Imputation with BEAGLE 5.1 giving an inconsistent number of alleles error
1
gravatar for User000
10 months ago by
User000440
User000440 wrote:

Hello,

I did a variant calling of 200 genotypes with freebayes. I filtered for the DP and GQ values and the genotypes that did not pass the filter were set to ./.. I now want to impute these filtered vcf files with BEAGLE v5.1. But it is giving me the following error:

java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: ERROR: inconsistent number of alleles for sample Sample_1469 at marker [chr4A   305381905   .   G   A]

What could be the problem? I had a look at the position and it looks like this: .:.:.:.:.:.:.:.:. This is missing data. Could it be the reason? If yes, how could I deal with this?

beagle freebayes • 755 views
ADD COMMENTlink modified 10 months ago by gubrins70 • written 10 months ago by User000440
0
gravatar for gubrins
10 months ago by
gubrins70
gubrins70 wrote:

Heys, I'm in the same situation as you, did you solve it? For me is not missing data, as I don't have the pattern you have. If not, let's see if somebody can help us!

ADD COMMENTlink written 10 months ago by gubrins70
1

Hey, yes at the end I changed all . in missing data ./.. In my case the . is really a missing data, while ./. is the missing genotype after I filtered the vcf. I found an answer how to change here on BioStars, but I cannot find the thread to give it credits:

zcat vcf.gz | perl -pe "s/\s\.:/\t.\/.:/g" | bgzip -c > out.vcf
ADD REPLYlink modified 10 months ago • written 10 months ago by User000440
1

Could you paste the pattern that you get?

ADD REPLYlink written 10 months ago by User000440

Thank you very much for your answer! I've seen that we are quite a lot of users with doubts about Beagle but not a lot of people answering them, so your help is really appreacite it right now! I'm going to try right what you did, but just in case, here is my error message:

java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: ERROR: inconsistent number of alleles for sample unknown at marker [NC_041312.1  1098286 .       T       C]

As you can see, is similar to the one you got (NC_041312.1 is one of my chromosomes)

And here is the observation itself: NC_041312.1 1098286 . T C 65.76 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=2;CIGAR=1X;DP=2;DPB=2;DPRA=0;EPP=7.35324;EPPR=0;GTI=0;LEN=1;MEANALT=1; MQM=60;MQMR=0;NS=1;NUMALT=1;ODDS=7.37776;PAIRED=1;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=82;QR=0; RO=0;RPL=1;RPP=3.0103;RPPR=0;RPR=1;RUN=1;SAF=1;SAP=3.0103;SAR=1;SF=1;SRF=0;SRP=0;SRR=0;TYPE=snp GT:QA:RO:AO:AD:DP:GL:QR . 1/1:82:0:2:0,2:2:-7.77968,-0.60206,0:0

Let's see if you can help me, I hit a wall...

ADD REPLYlink written 10 months ago by gubrins70
1

My solution will not help in your case, since it is simply replacing the . with ./..Why is the name of your sample is unknown? How many samples you have in your vcf file? Also is there a .between QR . 1/1?

ADD REPLYlink modified 10 months ago • written 10 months ago by User000440

I don't know why when I merge the different vcf files, the first one is called always unknown. I was just trying with a total of 2 vcf files this time. And yes, it seems there is a . between them, do you think that could be the problem?

ADD REPLYlink written 10 months ago by gubrins70
1

Does you bam files have read groups? In my opinion this error is something to do with the previous SNP calling step...I do not think it is ok to have an unknown sample name

ADD REPLYlink written 10 months ago by User000440

I was following this post and anything appeared, so I imagine I don't have them. https://gatkforums.broadinstitute.org/gatk/discussion/6472/read-groups

bwa mem -M -t 10 /mnt/CIBIO/homes/gabri.mochales/data/OXPHOS_run2/align/prueba/genome/index.fna /mnt/CIBIO/homes/gabri.mochales/data/OXPHOS_run2/RAPiD-Genomics_HL5T3BBXX_POR_100801_P01_WA01_i5-505_i7-59_S97_L001_R1_001.fastq.gz /mnt/CIBIO/homes/gabri.mochales/data/OXPHOS_run2/RAPiD-Genomics_HL5T3BBXX_POR_100801_P01_WA01_i5-505_i7-59_S97_L001_R2_001.fastq.gz | samtools sort -o /mnt/CIBIO/homes/gabri.mochales/data/OXPHOS_run2/results_alignment/RAPiD-Genomics_HL5T3BBXX_POR_100801_P01_WA01_i5-505_i7-59_S97_L001_.bam

where index is my genome, the paired end fastq files and the output. How can I create my read groups?

ADD REPLYlink written 10 months ago by gubrins70
1

could you please describe all the steps you are using to do variant calling? For example, I am following this freebayes protocol.

ADD REPLYlink modified 10 months ago • written 10 months ago by User000440

I followed that or another similar link:

freebayes -f index.fna /mnt/CIBIO/homes/gabri.mochales/data/OXPHOS_run2/results_gz/RAPiD-Genomics_HL5T3BBXX_POR_100801_P01_WA01_i5-505_i7-59_S97_L001_.bam > /mnt/CIBIO/homes/gabri.mochales/ecoli_SNP_calling/results_SNP_calling/RAPiD-Genomics_HL5T3BBXX_POR_100801_P01_WA01_i5-505_i7-59_S97_L001_.vcf

Is quite straightforward, let's see if it can help you

ADD REPLYlink written 10 months ago by gubrins70

Heys again, When I do this: java -jar picard.jar ValidateSamFile \ I=input.bam \ MODE=SUMMARY

I get all these errors, also they warn me that there is a missing read group:

Error Type Count ERROR:INVALID_FLAG_FIRST_OF_PAIR 23510 ERROR:INVALID_FLAG_MATE_UNMAPPED 5459 ERROR:INVALID_FLAG_SECOND_OF_PAIR 17307 ERROR:MISSING_READ_GROUP 1 WARNING:RECORD_MISSING_READ_GROUP 797715

ADD REPLYlink written 10 months ago by gubrins70
1

I can only suggest to use the best practice for every single method you use (freebayes, GATK or bcftools/samtools etc) and follow all the steps. For now for me it is very confusing to understand what you are doing. For sure read groups are missing, that is why you have unknown sample. In case of problems create and ask a new question here on the forum, I am sure you will find a solution.

ADD REPLYlink written 9 months ago by User000440

Just for the information of everybody, rather than using samtools + bwa + freebayes, I did everything following GATK and the phasing is working!

ADD REPLYlink written 9 months ago by gubrins70
1

If you do everything in a right way, it will also work with freebayes as it did for me

ADD REPLYlink written 9 months ago by User000440
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1113 users visited in the last hour
_