Question: Vcf File From Mpileup ... What Next
0
gravatar for Dataminer
5.0 years ago by
Dataminer2.6k
Netherlands
Dataminer2.6k wrote:

Hi,

I have done variant calling on my file using samtools mpile up and I have converetd my file from bcf to vcf.

It looks like this

##fileformat=VCFv4.1
##samtoolsVersion=0.1.18 (r982:295)
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square mapping quality of covering reads">
##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability of all samples being the same">
##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele frequency (assuming HWE)">
##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele count (no HWE assumption)">
##INFO=<ID=G3,Number=3,Type=Float,Description="ML estimate of genotype frequencies">
##INFO=<ID=HWE,Number=1,Type=Float,Description="Chi^2 based HWE test P-value based on G3">
##INFO=<ID=CLR,Number=1,Type=Integer,Description="Log ratio of genotype likelihoods with and without the constraint">
##INFO=<ID=UGT,Number=1,Type=String,Description="The most probable unconstrained genotype configuration in the trio">
##INFO=<ID=CGT,Number=1,Type=String,Description="The most probable constrained genotype configuration in the trio">
##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand bias, baseQ bias, mapQ bias and tail distance bias">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=PC2,Number=2,Type=Integer,Description="Phred probability of the nonRef allele frequency in group1 samples being larger (,smaller) than in group2.">
##INFO=<ID=PCHI2,Number=1,Type=Float,Description="Posterior weighted chi^2 P-value for testing the association between group1 and group2 samples.">
##INFO=<ID=QCHI2,Number=1,Type=Integer,Description="Phred scaled PCHI2.">
##INFO=<ID=PR,Number=1,Type=Integer,Description="# permutations yielding a smaller PCHI2.">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods for RR,RA,AA genotypes (R=ref,A=alt)">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality bases">
##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    /home/CleanData/Filtered_28_CTRL.sorted.bam
chr1    870903    .    T    C    7.8    .    DP=1;AF1=1;AC1=2;DP4=0,0,1,0;MQ=37;FQ=-30    GT:PL:DP:GQ    1/1:37,3,0:1:4
chr1    886006    .    T    C    7.8    .    DP=1;AF1=1;AC1=2;DP4=0,0,1,0;MQ=37;FQ=-30    GT:PL:DP:GQ    1/1:37,3,0:1:4
chr1    893280    .    G    A    7.8    .    DP=1;AF1=1;AC1=2;DP4=0,0,1,0;MQ=37;FQ=-30    GT:PL:DP:GQ    1/1:37,3,0:1:4
chr1    981087    .    A    G    7.8    .    DP=1;AF1=1;AC1=2;DP4=0,0,0,1;MQ=37;FQ=-30    GT:PL:DP:GQ    1/1:37,3,0:1:4
chr1    982462    .    T    C    7.8    .    DP=1;AF1=1;AC1=2;DP4=0,0,1,0;MQ=37;FQ=-30    GT:PL:DP:GQ    1/1:37,3,0:1:4
chr1    982513    .    T    C    7.8    .    DP=1;AF1=1;AC1=2;DP4=0,0,1,0;MQ=37;FQ=-30    GT:PL:DP:GQ    1/1:37,3,0:1:4
chr1    1162326    .    A    G    13.9    .    DP=2;VDB=0.0340;AF1=1;AC1=2;DP4=0,0,1,1;MQ=37;FQ=-33    GT:PL:DP:GQ    1/1:45,6,0:2:10

My Question is how can I annotate this file? and know about my snps and carry further analysis like using SIFT or pol[hen.

Any guidiance is welcome, thank you for your time.

variant-calling snps samtools • 3.0k views
ADD COMMENTlink modified 4.6 years ago by Biostar ♦♦ 20 • written 5.0 years ago by Dataminer2.6k
3
gravatar for Ashutosh Pandey
5.0 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

Search for "annotate variants". This question has already been answered many times. For example, What is the best tool for mouse (mm9 or mm10) variant annotations?

SIFT, Annovar, VEP are the most popular ones to annotate variants. If this is a human data, then you should try Gemini (GEMINI: integrative exploration of genetic variation and genome annotations) from Quinlan lab.

ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by Ashutosh Pandey11k
1
gravatar for QVINTVS_FABIVS_MAXIMVS
5.0 years ago by
USA SoCal
QVINTVS_FABIVS_MAXIMVS2.2k wrote:

You can use the IGV to visualize your variants on a genome level. You have to index the VCF file first, but it's not hard.

ADD COMMENTlink written 5.0 years ago by QVINTVS_FABIVS_MAXIMVS2.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1360 users visited in the last hour