Question: Error Loading Vcf File In Ucsc Genome Browser
0
gravatar for win
6.7 years ago by
win810
India
win810 wrote:

Hi all, I generated a vcf file using samtools and the wanted to load it into UCSC Genome Browser as suggested at this link

http://genome.ucsc.edu/goldenPath/help/vcf.html

However, i get an error stating vcf file does not meet regex or something on that lines.

The header of my vcf file is as follows

##fileformat=VCFv4.1
##samtoolsVersion=0.1.18 (r982:295)
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square mapping quality of covering reads">
##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability of all samples being the same">
##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele frequency (assuming HWE)">
##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele count (no HWE assumption)">
##INFO=<ID=G3,Number=3,Type=Float,Description="ML estimate of genotype frequencies">
##INFO=<ID=HWE,Number=1,Type=Float,Description="Chi^2 based HWE test P-value based on G3">
##INFO=<ID=CLR,Number=1,Type=Integer,Description="Log ratio of genotype likelihoods with and without the constraint">
##INFO=<ID=UGT,Number=1,Type=String,Description="The most probable unconstrained genotype configuration in the trio">
##INFO=<ID=CGT,Number=1,Type=String,Description="The most probable constrained genotype configuration in the trio">
##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand bias, baseQ bias, mapQ bias and tail distance bias">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=PC2,Number=2,Type=Integer,Description="Phred probability of the nonRef allele frequency in group1 samples being larger (,smaller) than in group2.">
##INFO=<ID=PCHI2,Number=1,Type=Float,Description="Posterior weighted chi^2 P-value for testing the association between group1 and group2 samples.">
##INFO=<ID=QCHI2,Number=1,Type=Integer,Description="Phred scaled PCHI2.">
##INFO=<ID=PR,Number=1,Type=Integer,Description="# permutations yielding a smaller PCHI2.">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods for RR,RA,AA genotypes (R=ref,A=alt)">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality bases">
##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    sorted.read.bam

Any ideas what might be incorrect.

Thanks in advance.

vcf • 2.3k views
ADD COMMENTlink modified 6.7 years ago by Chris Miller21k • written 6.7 years ago by win810

what commands did you use to create the .vcf.gz and .tbi? what does the .bed file with the track line pointing to the vcf.gz contain?

ADD REPLYlink written 6.7 years ago by brentp23k

for .vcf.gz I used bgzip myfile.vcf and for .tbi I used tabix -p vcf myfile.vcf

ADD REPLYlink modified 6.7 years ago • written 6.7 years ago by win810
2
gravatar for Chris Miller
6.7 years ago by
Chris Miller21k
Washington University in St. Louis, MO
Chris Miller21k wrote:

Try validating your file with VCFtools first. It should provide informative error messages if your file isn't to spec.

ADD COMMENTlink written 6.7 years ago by Chris Miller21k

i ran the VCFValidate script against my VCF and i am getting a whole bunch of errors starting with "Number found where operator expected at Line1.........

I am using the latest version of Samtools,

Any ideas?

ADD REPLYlink written 6.7 years ago by win810
1

Make sure each header statement is on new line. Sometimes the header gets messed up if you move the file between windows and Mac or perform some grep operations.

ADD REPLYlink written 6.7 years ago by Ashutosh Pandey11k

You should better open a new question about that. You will get more answers.

ADD REPLYlink written 6.7 years ago by Giovanni M Dall'Olio26k

Match your errors up to the VCF spec and see if you can figure out what's wrong: http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41

ADD REPLYlink written 6.7 years ago by Chris Miller21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 550 users visited in the last hour