Entering edit mode
5.4 years ago
sanna.gudmundsson
▴
10
Hi,
I have exome sequencing data from about 50 individuals. I would like to generate the frequencies of each variant in my dataset. Do you have any suggestions for easy to use tools from e.g. samtools or GATK?
What I want to be able to do in the end is filter out things that are present in my "in-house data base" to get rid of sequencing artefacts.
Thanks in advance!
Sanna
50 vcf files with 1 sample
or
1 vcf file with 50 samples ?
if it's one file, then those frequences should already be available in the INFO/AF attribute...
Sorry I think Im not being clear.
I have 50 vcf files (whole exom) for 50 individuals. I would like to filter my data towards the 50 files.
I want to know how many times in my 50 files a certain variant occurs. For example, maybe generate a master vcf with all variants found in my 50 individuals and an informative string of how often they occur as heterozygous AF=0/1 or homo 1/1.
Is there a simple tool that you know of that does this? samtools, bcftools or similar?
Sanna