Question: Extracting Individual Allele Frequencies From A Vcf File
gravatar for Jitendra
8.5 years ago by
Jitendra50 wrote:

Hi all, I have used SAMtools to identify SNPs within 3 separate individuals and have generated a VCF file. I am able to extract the depth for each individual but am struggling to get the allele frequency at each SNP for each individual. The VCF contains an overall alternative allele frequency but I would like this broken down per individual. If anybody knows if this is possible and how to do it I'd be grateful for any advice. Thanks!

vcf samtools snp • 12k views
ADD COMMENTlink written 8.5 years ago by Jitendra50

Just a couple quick questions to clarify: Does each individual have their own VCF file? Are you looking for the allele frequency for each SNP in the general population -- or in each individual? Generally the last columns in the VCF (the ones that start [1/1] or [0/1] etc) are the allele frequencies in the individual for that variant. Maybe to clarify even more you could post a couple lines from your VCF.

ADD REPLYlink written 8.5 years ago by Alex Paciorkowski3.4k

Using this command "vcftools --vcf aln.vcf --freq" I have got only 3 types of allele frequencies: 0, 0.5 and 1. Could somebody advice me how to get real frequencies at certain posiitions? Thanks a lot.

ADD REPLYlink written 8.1 years ago by Biomonika (Noolean)3.1k

Yes, you will get 3 frequencies only, either homozygous SNPs (0 and 1) and heterozygous SNPs (0.5 and 0.5)

ADD REPLYlink written 4.4 years ago by rse90

I have extracted the following members of variants.

'AC', 'AF', 'ALT', 'AN', 'ANN', 'AO', 'ASP', 'ASS', 'CAF', 'CDA', 'CFL', 'CHROM', 'COMMON', 'DP', 'DSS', 'FAO', 'FDP', 'FILTER_NOCALL', 'FILTER_PASS', 'FR', 'FRO', 'FSAF', 'FSAR', 'FSRF', 'FSRR', 'FWDB', 'FXX', 'G5', 'G5A', 'GENEINFO', 'GNO', 'HD', 'HRUN', 'HS', 'ID', 'INT', 'KGPROD', 'KGPhase1', 'KGPilot123', 'KGValidated', 'LEN', 'LOF', 'LSD', 'MLLD', 'MTP', 'MUT', 'NMD', 'NOC', 'NOV', 'NS', 'NSF', 'NSM', 'NSN', 'OALT', 'OID', 'OM', 'OMAPALT', 'OPOS', 'OREF', 'OTH', 'OTHERKG', 'PB', 'PBP', 'PH3', 'PM', 'PMC', 'POS', 'QD', 'QUAL', 'R3', 'R5', 'RBI', 'REF', 'REFB', 'REVB', 'RO', 'RS', 'RSPOS', 'RV', 'S3D', 'SAF', 'SAO', 'SAR', 'SF', 'SLO', 'SRF', 'SRR', 'SSEN', 'SSEP', 'SSR', 'SSSB', 'STB', 'STBP', 'SYN', 'TPA', 'TYPE', 'U3', 'U5', 'VARB', 'VC', 'VLD', 'VP', 'WGT', 'WTD', 'altlen', 'dbNSFP_SIFT_score', 'dbSNPBuildID', 'is_snp', 'numalt' Which attributes usually determine SNP?

ADD REPLYlink written 16 months ago by sazzadur.rahman0
gravatar for Maxime Lamontagne
8.5 years ago by
Maxime Lamontagne2.2k wrote:

I think this should work. You need vcftools to do this.

The first command will isolate the genotype for Subject1:

.../tabix-0.2.3/vcftools_0.1.4a/perl/vcf-subset -c Subject1 genotypes.vcf > temp.vcf

The second one will transform the data in frequency:

.../tabix-0.2.3/vcftools_0.1.4a/cpp/vcftools --vcf temp.vcf --freq --out Subject1.SNP.allelefrequencies
ADD COMMENTlink written 8.5 years ago by Maxime Lamontagne2.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 749 users visited in the last hour