How to filter Germline variants from vcf file?
0
0
Entering edit mode
3.3 years ago

Hello. I'm new to Bioinformatics working on NGS data. Now I'm working on detecting SNP variants causing multiple sclerosis.

I selected some genes for target resequencing and got NGS data. I put these datas into Qiagen's NGS variant calling pipeline and got vcf files for each samples.

I checked them and realized that these vcf files contain very low variant minor allele frequency(VMF) SNPs. I want to deal with germline variants, so I'm trying to filter these low VMF variants.

I'm using vcftools but I can't solve this problem... I know I have to study more but time is running out so is there any kind person who could help me please...? Thank you.

genome vcf • 1.3k views
ADD COMMENT
0
Entering edit mode

If you have per-sample VCFs, the only possible minor AF values for bi-allelic variant loci are 0.5 and 1. What do you mean by "very low minor allele frequency"?

ADD REPLY
0
Entering edit mode

Thank you for your comment! I think what you said is the right for germline variant, but here is my vcf file example, my variant contains very low VMF/VF.(0.0740741). Does this mean somatic variant?? In FORMAT, VF is defined as "Variant UMI allele frequency, same as VMF," so I thought I have to eliminate this variant. Am I right?

##fileformat=VCFv4.2
##reference=/srv/qgen/data/genome/hg19/ucsc.hg19.fa
##FILTER=<ID=LM,Description="Low coverage (fewer than 5 barcodes)">
##FILTER=<ID=RepT,Description="Variant in tandem repeat (TFR) regions">
##FILTER=<ID=RepS,Description="Variant in simple repeats (RepeatMasker) regions">
##FILTER=<ID=HP,Description="Inside or flanked by homopolymer regions">
##FILTER=<ID=LowC,Description="Variant in Low complexity regions, as defined in RepeatMasker">
##FILTER=<ID=SL,Description="Variant in micro-satelite regions, as defined in RepeatMasker">
##FILTER=<ID=SB,Description="Strand Bias">
##FILTER=<ID=DP,Description="Too many discordant pairs">
##FILTER=<ID=MM,Description="Too many mismatches in a read. Default threshold is 6.5 per 100 bases">
##FILTER=<ID=LowQ,Description="Low base quality">
##FILTER=<ID=RBCP,Description="Variant are clustered at the end of barcode-side reads">
##FILTER=<ID=RPCP,Description="Variant are clustered at the end of primer-side reads">
##FILTER=<ID=PB,Description="Primer bias filter. odds ratio > 10 or < 0.1">
##FILTER=<ID=PrimerCP,Description="variant is clustered within 2 bases from primer sequence due to possible primer dimers">
##INFO=<ID=TYPE,Number=.,Type=String,Description="Variant type: SNP/INDEL/COMPLEX">
##INFO=<ID=RepRegion,Number=.,Type=String,Description="Repetitive region">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total read depth">
##INFO=<ID=UMT,Number=1,Type=Integer,Description="Total used UMI depth">
##INFO=<ID=VMT,Number=.,Type=Integer,Description="Variant UMI depth">
##INFO=<ID=VMF,Number=.,Type=Float,Description="Variant UMI allele frequency">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Filtered allelic UMI depths for the ref and alt alleles">
##FORMAT=<ID=VF,Number=.,Type=Float,Description="Variant UMI allele frequency, same as VMF">


#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  20-026_S1
chr3    49693658    .   C   A   24.68   PrimerCP    TYPE=SNP;RepRegion=NA;DP=532;UMT=243;VMT=18;VMF=0.0740741;ANN=A|synonymous_variant|LOW|BSN|ENSG00000164061|transcript|ENST00000296452|protein_coding|5/12|c.6669C>A|p.Gly2223Gly|6783/15955|6669/11781|2223/3926||,A|downstream_gene_variant|MODIFIER|BSN|ENSG00000164061|transcript|ENST00000467456|retained_intron||n.*4876C>A|||||4876|  GT:AD:VF    0/1:225,18:0.0740741
ADD REPLY
0
Entering edit mode

You did not mention UMIs at all in your initial post - I think UMIs are only involved in single cell sequencing, and I don't really know single cell technologies. Maybe experts on the technology can chime in.

ADD REPLY

Login before adding your answer.

Traffic: 2362 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6