Question: [ERROR] Malformed VCF: empty alleles are not permitted in VCF records
0
gravatar for umn_bist
3.4 years ago by
umn_bist320
umn_bist320 wrote:

I am running BaseRecalibrator for my RNA-seq:

java -jar -Xmx120g ${GATK} -T BaseRecalibrator \
                           -R "${reference}" \
                           -I "${file4}" \
                           -knownSites "${gerVar}" \
                           -knownSites "${somVar}" \
                           -o "${file4%_tstaids.bam}_tstaidsr.table1"
java -jar -Xmx120g ${GATK} -T BaseRecalibrator \
                           -R "${reference}" \
                           -I "${file4}" \
                           -knownSites "${gerVar}" \
                           -knownSites "${somVar}" \
                           -BQSR "${file4%_tstaids.bam}_tstaidsr.table1" \
                           -o "${file4%_tstaids.bam}_tstaidsr.table2"
java -jar -Xmx120g ${GATK} -T AnalyzeCovariates 
                           -R "${reference}" \
                           -before "${file4%_tstaids.bam}_tstaidsr.table1" \
                           -after "${file4%_tstaids.bam}_tstaidsr.table2" \
                           -plots "${file1%_tsta.bam}_BQSR.pdf"
java -jar -Xmx120g ${GATK} -T PrintReads 
                           -R "{reference}" \
                           -I "${file4}" \
                           -BQSR "${file4%_tstaids.bam}_tstaidsr.table1" \
                           -o "${file7}"

Note that I got 2 variant VCF from Ensembl (germline and somatic). My reference is Ensembl GRCh38.p5. I ran the command below to append 'chr' notation and change chrMT to chrM:

sed -e '/^[^#]/s/^/chr/' -e 's/^chrMT/chrM/'

I received this error:

##### ERROR MESSAGE: The provided VCF file is malformed at approximately line number 18354680: empty alleles are not permitted in VCF records

I used the command below to inspect my VCF file (it is ${gerVar} that is malformed):

sed -n '18354680p'

which returned:

chr11    5249456    HbVar.633    G        .    .    PhenCode_20140430;TSA=sequence_alteration;AA=A
rna-seq gatk vcf • 1.7k views
ADD COMMENTlink modified 10 months ago by RamRS22k • written 3.4 years ago by umn_bist320

You found the origin of your problem. So, what is the question ?

ADD REPLYlink written 3.4 years ago by Pierre Lindenbaum121k

Yes, the question is, is there a better way of amending this error without redownloading the original VCF file to cross check what replaces the empty allele? Could this malformation due to the sed function, potentially opening up other empty alleles in the file?

ADD REPLYlink written 3.4 years ago by umn_bist320

I guess it's not due to sed functionality. It's quite possible that there will be empty alleles. So check how many of them are there, then if they are very few, remove them from the file.

ADD REPLYlink written 3.4 years ago by geek_y9.8k
3
gravatar for Pierre Lindenbaum
3.4 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum121k wrote:

Cleanup your vcf :

awk -F '\t' '($0 ~ /^#/ || $5!=".")' in.vcf > out.vcf
ADD COMMENTlink modified 10 months ago by RamRS22k • written 3.4 years ago by Pierre Lindenbaum121k

This solution does not work.

ADD REPLYlink written 22 months ago by freuv110
2

@freuv1 https://meta.stackexchange.com/questions/147616/

ADD REPLYlink written 22 months ago by Pierre Lindenbaum121k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 827 users visited in the last hour