Extracting Individual Allele Frequencies From A Vcf File
1
1
Entering edit mode
12.0 years ago
Jitendra ▴ 60

Hi all, I have used SAMtools to identify SNPs within 3 separate individuals and have generated a VCF file. I am able to extract the depth for each individual but am struggling to get the allele frequency at each SNP for each individual. The VCF contains an overall alternative allele frequency but I would like this broken down per individual. If anybody knows if this is possible and how to do it I'd be grateful for any advice. Thanks!

snp vcf samtools • 15k views
ADD COMMENT
3
Entering edit mode

Just a couple quick questions to clarify: Does each individual have their own VCF file? Are you looking for the allele frequency for each SNP in the general population -- or in each individual? Generally the last columns in the VCF (the ones that start [1/1] or [0/1] etc) are the allele frequencies in the individual for that variant. Maybe to clarify even more you could post a couple lines from your VCF.

ADD REPLY
0
Entering edit mode

Using this command "vcftools --vcf aln.vcf --freq" I have got only 3 types of allele frequencies: 0, 0.5 and 1. Could somebody advice me how to get real frequencies at certain posiitions? Thanks a lot.

ADD REPLY
0
Entering edit mode

Yes, you will get 3 frequencies only, either homozygous SNPs (0 and 1) and heterozygous SNPs (0.5 and 0.5)

ADD REPLY
0
Entering edit mode

I have extracted the following members of variants.

'AC', 'AF', 'ALT', 'AN', 'ANN', 'AO', 'ASP', 'ASS', 'CAF', 'CDA', 'CFL', 'CHROM', 'COMMON', 'DP', 'DSS', 'FAO', 'FDP', 'FILTER_NOCALL', 'FILTER_PASS', 'FR', 'FRO', 'FSAF', 'FSAR', 'FSRF', 'FSRR', 'FWDB', 'FXX', 'G5', 'G5A', 'GENEINFO', 'GNO', 'HD', 'HRUN', 'HS', 'ID', 'INT', 'KGPROD', 'KGPhase1', 'KGPilot123', 'KGValidated', 'LEN', 'LOF', 'LSD', 'MLLD', 'MTP', 'MUT', 'NMD', 'NOC', 'NOV', 'NS', 'NSF', 'NSM', 'NSN', 'OALT', 'OID', 'OM', 'OMAPALT', 'OPOS', 'OREF', 'OTH', 'OTHERKG', 'PB', 'PBP', 'PH3', 'PM', 'PMC', 'POS', 'QD', 'QUAL', 'R3', 'R5', 'RBI', 'REF', 'REFB', 'REVB', 'RO', 'RS', 'RSPOS', 'RV', 'S3D', 'SAF', 'SAO', 'SAR', 'SF', 'SLO', 'SRF', 'SRR', 'SSEN', 'SSEP', 'SSR', 'SSSB', 'STB', 'STBP', 'SYN', 'TPA', 'TYPE', 'U3', 'U5', 'VARB', 'VC', 'VLD', 'VP', 'WGT', 'WTD', 'altlen', 'dbNSFP_SIFT_score', 'dbSNPBuildID', 'is_snp', 'numalt' Which attributes usually determine SNP?

ADD REPLY
5
Entering edit mode
12.0 years ago

I think this should work. You need vcftools to do this.

The first command will isolate the genotype for Subject1:

.../tabix-0.2.3/vcftools_0.1.4a/perl/vcf-subset -c Subject1 genotypes.vcf > temp.vcf

The second one will transform the data in frequency:

.../tabix-0.2.3/vcftools_0.1.4a/cpp/vcftools --vcf temp.vcf --freq --out Subject1.SNP.allelefrequencies
ADD COMMENT

Login before adding your answer.

Traffic: 2905 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6