Question: How to filter Germline variants from vcf file?
0
gravatar for leukippus0116
8 weeks ago by
leukippus01160 wrote:

Hello. I'm new to Bioinformatics working on NGS data. Now I'm working on detecting SNP variants causing multiple sclerosis.

I selected some genes for target resequencing and got NGS data. I put these datas into Qiagen's NGS variant calling pipeline and got vcf files for each samples.

I checked them and realized that these vcf files contain very low variant minor allele frequency(VMF) SNPs. I want to deal with germline variants, so I'm trying to filter these low VMF variants.

I'm using vcftools but I can't solve this problem... I know I have to study more but time is running out so is there any kind person who could help me please...? Thank you.

vcf genome • 166 views
ADD COMMENTlink written 8 weeks ago by leukippus01160

If you have per-sample VCFs, the only possible minor AF values for bi-allelic variant loci are 0.5 and 1. What do you mean by "very low minor allele frequency"?

ADD REPLYlink written 7 weeks ago by Ram32k

Thank you for your comment! I think what you said is the right for germline variant, but here is my vcf file example, my variant contains very low VMF/VF.(0.0740741). Does this mean somatic variant?? In FORMAT, VF is defined as "Variant UMI allele frequency, same as VMF," so I thought I have to eliminate this variant. Am I right?

##fileformat=VCFv4.2
##reference=/srv/qgen/data/genome/hg19/ucsc.hg19.fa
##FILTER=<ID=LM,Description="Low coverage (fewer than 5 barcodes)">
##FILTER=<ID=RepT,Description="Variant in tandem repeat (TFR) regions">
##FILTER=<ID=RepS,Description="Variant in simple repeats (RepeatMasker) regions">
##FILTER=<ID=HP,Description="Inside or flanked by homopolymer regions">
##FILTER=<ID=LowC,Description="Variant in Low complexity regions, as defined in RepeatMasker">
##FILTER=<ID=SL,Description="Variant in micro-satelite regions, as defined in RepeatMasker">
##FILTER=<ID=SB,Description="Strand Bias">
##FILTER=<ID=DP,Description="Too many discordant pairs">
##FILTER=<ID=MM,Description="Too many mismatches in a read. Default threshold is 6.5 per 100 bases">
##FILTER=<ID=LowQ,Description="Low base quality">
##FILTER=<ID=RBCP,Description="Variant are clustered at the end of barcode-side reads">
##FILTER=<ID=RPCP,Description="Variant are clustered at the end of primer-side reads">
##FILTER=<ID=PB,Description="Primer bias filter. odds ratio > 10 or < 0.1">
##FILTER=<ID=PrimerCP,Description="variant is clustered within 2 bases from primer sequence due to possible primer dimers">
##INFO=<ID=TYPE,Number=.,Type=String,Description="Variant type: SNP/INDEL/COMPLEX">
##INFO=<ID=RepRegion,Number=.,Type=String,Description="Repetitive region">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total read depth">
##INFO=<ID=UMT,Number=1,Type=Integer,Description="Total used UMI depth">
##INFO=<ID=VMT,Number=.,Type=Integer,Description="Variant UMI depth">
##INFO=<ID=VMF,Number=.,Type=Float,Description="Variant UMI allele frequency">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Filtered allelic UMI depths for the ref and alt alleles">
##FORMAT=<ID=VF,Number=.,Type=Float,Description="Variant UMI allele frequency, same as VMF">


#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  20-026_S1
chr3    49693658    .   C   A   24.68   PrimerCP    TYPE=SNP;RepRegion=NA;DP=532;UMT=243;VMT=18;VMF=0.0740741;ANN=A|synonymous_variant|LOW|BSN|ENSG00000164061|transcript|ENST00000296452|protein_coding|5/12|c.6669C>A|p.Gly2223Gly|6783/15955|6669/11781|2223/3926||,A|downstream_gene_variant|MODIFIER|BSN|ENSG00000164061|transcript|ENST00000467456|retained_intron||n.*4876C>A|||||4876|  GT:AD:VF    0/1:225,18:0.0740741
ADD REPLYlink written 7 weeks ago by leukippus01160

You did not mention UMIs at all in your initial post - I think UMIs are only involved in single cell sequencing, and I don't really know single cell technologies. Maybe experts on the technology can chime in.

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by Ram32k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1946 users visited in the last hour
_