QC Sequence Data from VCF
2
2
Entering edit mode
9.8 years ago
Katie D'Aco ★ 1.1k

Is there a standard way to check sequencing quality from a VCF? For example, if you're working on a case/control exome project and you are supplied with VCFs for each sample, but don't have access to BAMs or FASTAs, how would you go about checking that the supplied VCFs are good enough to be included in the study?

Some common metrics I have seen are Ti/Tv, concordance with dbSNP, proportion of missense, synonymous variants, etc. Is there anything else one could check? Are there any tools that will do these calculations for you?

qc vcf sequencing • 3.8k views
ADD COMMENT
1
Entering edit mode

I think Ti/Tv, concordance with dbSNP, proportion of missense and synonymous variants should be enough to give you a good idea about your sequencing data. Using INFO field for the evaluation as suggested by "themysticgeek" will only make sense if the low quality variants were flagged and not completely removed from the final VCF file. Then you can make comparisons between flagged and passed variants. For example, if you notice that a majority of variants (lets say greater than 20%) were flagged as "FAILED" because of low coverage or base quality then you can make some conclusion about your sequencing data.

ADD REPLY
0
Entering edit mode

Thanks for the input!

ADD REPLY
2
Entering edit mode
9.8 years ago
kautilya ▴ 430

You could look into some other INFO fields in the vcf like DP(Depth across samples), MQ0(Number of Mapping Quality == 0 reads covering this record). Values for these fields are usually calculated by the variant caller itself

As for the tools I would suggest the following:-

Ti/Tv & proportion of missense, synonymous variants :- SnpEff

Concordance with dbSNP:- GATK VariantEval

ADD COMMENT

Login before adding your answer.

Traffic: 2946 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6