Question: How to reduce annotation errors using SNPEff on BBMap`s and Pilon`s VCFs?
0
gravatar for human_genomeXXX
4.0 years ago by
human_genomeXXX10 wrote:

How to reduce annotation errors using SNPEff on BBMap and Pilon VCFs?

I get "chromosome not found" error in Tuberculosis genome data processing experiments and the output is red and full of errors

I have tried # annotation

  java -Xmx10G -jar snpEff.jar -c snpEff.config -s SNPEffBBmapOutputStats.html -v -no-downstream -no-upstream   m_tuberculosis_H37Rv BBMap_variant_call.vcf> SNPEffBBmapGenome_merge.var.ann.vcf

# default parameters of SNPEff:

 java -Xmx10G -jar snpEff.jar -c snpEff.config -s SNPEffBBmapOutputStats.html - m_tuberculosis_H37Rv BBMap_variant_call.vcf>  SNPEffBBmapGenome_merge.var.ann.vcf

Should I change settings of SNPEff or preprocess VCFs somehow before inputting them into the annotation engine?

Thanks.

bbmap snpeff pilon vcf • 1.2k views
ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by human_genomeXXX10

It's possible that the problem is the chromosome names having spaces in them. Can you post the VCF header?

ADD REPLYlink written 4.0 years ago by Brian Bushnell17k
##fileformat=VCFv4.1
##fileDate=20170202
##source="Pilon version 1.21 Fri Dec 9 16:44:44 2016 -0500"
##PILON="--genome H37Rv_reference.fa --frags file.sorted.bam --output pilon_output.pilon --vcf"
##reference=file:/home/mat29/Desktop/Ready_Genomics_software/H37Rv_reference.fa
##contig=<ID=Mycobacterium,length=4411532>
##FILTER=<ID=LowCov,Description="Low Coverage of good reads at location">
##FILTER=<ID=Amb,Description="Ambiguous evidence in haploid genome">
##FILTER=<ID=Del,Description="This base is in a deletion or change event from another record">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Valid read depth; some reads may have been filtered">
##INFO=<ID=TD,Number=1,Type=Integer,Description="Total read depth including bad pairs">
##INFO=<ID=PC,Number=1,Type=Integer,Description="Physical coverage of valid inserts across locus">
##INFO=<ID=BQ,Number=1,Type=Integer,Description="Mean base quality at locus">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Mean read mapping quality at locus">
##INFO=<ID=QD,Number=1,Type=Integer,Description="Variant confidence/quality by depth">
##INFO=<ID=BC,Number=4,Type=Integer,Description="Count of As, Cs, Gs, Ts at locus">
##INFO=<ID=QP,Number=4,Type=Integer,Description="Percentage of As, Cs, Gs, Ts weighted by Q & MQ at locus">
##INFO=<ID=IC,Number=1,Type=Integer,Description="Number of reads with insertion here">
##INFO=<ID=DC,Number=1,Type=Integer,Description="Number of reads with deletion here">
##INFO=<ID=XC,Number=1,Type=Integer,Description="Number of reads clipped here">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
##INFO=<ID=AF,Number=A,Type=Float,Description="Fraction of evidence in support of alternate allele(s)">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=SVLEN,Number=.,Type=String,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise change from local reassembly (ALT contains Ns)">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=AD,Number=.,Type=String,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=String,Description="Approximate read depth; some reads may have been filtered">
##ALT=<ID=DUP,Description="Possible segmental duplication">
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    SAMPLE
Mycobacterium    1    .    T    .    777    PASS    DP=22;TD=42;BQ=35;MQ=40;QD=35;BC=0,0,0,22;QP=0,0,0,100;PC=35;IC=0;DC=0;XC=0;AC=0;AF=0.00    GT    0/0

Thank you!!!

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by human_genomeXXX10

This looks like a mismatch of chromosome identifiers to me.

ADD REPLYlink written 4.0 years ago by WouterDeCoster45k

I post the other one, it looks unhealthy ---

 ##fileformat=VCFv4.2
##BBMapVersion=36.86
##ploidy=1
##rarity=1.00000
##minallelefraction=0.10000
##reads=789372
##pairedReads=789372
##properlyPairedReads=562846
##readLengthAvg=90.982
##properPairRate=0.71303
##totalQualityAvg=34.373
##mapqAvg=16.857
##reference=H37Rv_reference.fa
##contig=<ID=NC_000962.3,length=4411532>
##FORMAT=<ID=PASS,Number=1,Type=String,Description="Pass">
##FORMAT=<ID=FAIL,Number=1,Type=String,Description="Fail">
##INFO=<ID=SN,Number=1,Type=Integer,Description="Scaffold Number">
##INFO=<ID=STA,Number=1,Type=Integer,Description="Start">
##INFO=<ID=STO,Number=1,Type=Integer,Description="Stop">
##INFO=<ID=TYP,Number=1,Type=Integer,Description="Type">
##INFO=<ID=R1P,Number=1,Type=Integer,Description="Read1 Plus Count">
##INFO=<ID=R1M,Number=1,Type=Integer,Description="Read1 Minus Count">
##INFO=<ID=R2P,Number=1,Type=Integer,Description="Read2 Plus Count">
##INFO=<ID=R2M,Number=1,Type=Integer,Description="Read2 Minus Count">
##INFO=<ID=PPC,Number=1,Type=Integer,Description="Paired Count">
##INFO=<ID=LS,Number=1,Type=Integer,Description="Length Sum">
##INFO=<ID=MQS,Number=1,Type=Integer,Description="MAPQ Sum">
##INFO=<ID=MQM,Number=1,Type=Integer,Description="MAPQ Max">
##INFO=<ID=BQS,Number=1,Type=Integer,Description="Base Quality Sum">
##INFO=<ID=BQM,Number=1,Type=Integer,Description="Base Quality Max">
##INFO=<ID=EDS,Number=1,Type=Integer,Description="End Distance Sum">
##INFO=<ID=EDM,Number=1,Type=Integer,Description="End Distance Max">
##INFO=<ID=IDS,Number=1,Type=Integer,Description="Identity Sum">
##INFO=<ID=IDM,Number=1,Type=Integer,Description="Identity Max">
##INFO=<ID=COV,Number=1,Type=Integer,Description="Coverage">
##INFO=<ID=MCOV,Number=1,Type=Integer,Description="Minus Coverage">
##INFO=<ID=CED,Number=1,Type=Integer,Description="Contig End Distance">
##INFO=<ID=HMP,Number=1,Type=Integer,Description="Homopolymer Count">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=1,Type=Float,Description="Allele Fraction">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="Ref+, Ref-, Alt+, Alt-">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Allele Depth">
##FORMAT=<ID=AF,Number=1,Type=Float,Description="Allele Fraction">
##FORMAT=<ID=SC,Number=1,Type=Float,Description="Score">
##FORMAT=<ID=PF,Number=1,Type=String,Description="Pass Filter">
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    bbmap_mapped
NC_000962.3    204    .    C    T    33.97    PASS    SN=0;STA=203;STO=204;TYP=SUB;R1P=6;R1M=3;R2P=5;R2M=2;PPC=2;LS=1456;MQS=226;MQM=18;BQS=495;BQM=38;EDS=440;EDM=49;IDS=13378;IDM=857;COV=16;MCOV=-1;CED=203;HMP=3;DP=16;AF=1.0000;DP4=-3,3,11,5    GT:DP:AD:AF:SC:PF    1:16:16:1.0000:33.97:PASS
NC_000962.3    207    .    T    C    36.53    PASS    SN=0;STA=206;STO=207;TYP=SUB;R1P=6;R1M=3;R2P=5;R2M=2;PPC=2;LS=1456;MQS=226;MQM=18;BQS=577;BQM=40;EDS=446;EDM=49;IDS=13378;IDM=857;COV=16;MCOV=-1;CED=206;HMP=0;DP=16;AF=1.0000;DP4=-3,3,11,5    GT:DP:AD:AF:SC:PF    1:16:16:1.0000:36.53:PASS
NC_000962.3    210    .    C    G    36.26    PASS    SN=0;STA=209;STO=210;TYP=SUB;R1P=6;R1M=3;R2P=4;
ADD REPLYlink written 4.0 years ago by human_genomeXXX10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 971 users visited in the last hour
_