Question: [ERROR] Malformed VCF: empty alleles are not permitted in VCF records
0
gravatar for umn_bist
4.3 years ago by
umn_bist370
umn_bist370 wrote:

I am running BaseRecalibrator for my RNA-seq:

java -jar -Xmx120g ${GATK} -T BaseRecalibrator \
                           -R "${reference}" \
                           -I "${file4}" \
                           -knownSites "${gerVar}" \
                           -knownSites "${somVar}" \
                           -o "${file4%_tstaids.bam}_tstaidsr.table1"
java -jar -Xmx120g ${GATK} -T BaseRecalibrator \
                           -R "${reference}" \
                           -I "${file4}" \
                           -knownSites "${gerVar}" \
                           -knownSites "${somVar}" \
                           -BQSR "${file4%_tstaids.bam}_tstaidsr.table1" \
                           -o "${file4%_tstaids.bam}_tstaidsr.table2"
java -jar -Xmx120g ${GATK} -T AnalyzeCovariates 
                           -R "${reference}" \
                           -before "${file4%_tstaids.bam}_tstaidsr.table1" \
                           -after "${file4%_tstaids.bam}_tstaidsr.table2" \
                           -plots "${file1%_tsta.bam}_BQSR.pdf"
java -jar -Xmx120g ${GATK} -T PrintReads 
                           -R "{reference}" \
                           -I "${file4}" \
                           -BQSR "${file4%_tstaids.bam}_tstaidsr.table1" \
                           -o "${file7}"

Note that I got 2 variant VCF from Ensembl (germline and somatic). My reference is Ensembl GRCh38.p5. I ran the command below to append 'chr' notation and change chrMT to chrM:

sed -e '/^[^#]/s/^/chr/' -e 's/^chrMT/chrM/'

I received this error:

##### ERROR MESSAGE: The provided VCF file is malformed at approximately line number 18354680: empty alleles are not permitted in VCF records

I used the command below to inspect my VCF file (it is ${gerVar} that is malformed):

sed -n '18354680p'

which returned:

chr11    5249456    HbVar.633    G        .    .    PhenCode_20140430;TSA=sequence_alteration;AA=A
rna-seq gatk vcf • 2.0k views
ADD COMMENTlink modified 20 months ago by RamRS27k • written 4.3 years ago by umn_bist370

You found the origin of your problem. So, what is the question ?

ADD REPLYlink written 4.3 years ago by Pierre Lindenbaum128k

Yes, the question is, is there a better way of amending this error without redownloading the original VCF file to cross check what replaces the empty allele? Could this malformation due to the sed function, potentially opening up other empty alleles in the file?

ADD REPLYlink written 4.3 years ago by umn_bist370

I guess it's not due to sed functionality. It's quite possible that there will be empty alleles. So check how many of them are there, then if they are very few, remove them from the file.

ADD REPLYlink written 4.3 years ago by geek_y11k
3
gravatar for Pierre Lindenbaum
4.3 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum128k wrote:

Cleanup your vcf :

awk -F '\t' '($0 ~ /^#/ || $5!=".")' in.vcf > out.vcf
ADD COMMENTlink modified 20 months ago by RamRS27k • written 4.3 years ago by Pierre Lindenbaum128k

This solution does not work.

ADD REPLYlink written 2.7 years ago by freuv110
2

@freuv1 https://meta.stackexchange.com/questions/147616/

ADD REPLYlink written 2.7 years ago by Pierre Lindenbaum128k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2245 users visited in the last hour