Question: Count numbers of population level and sample level SNPs in my vcf
gravatar for Cece
4.3 years ago by
United States/Houston
Cece0 wrote:

I'm a python newbie and I'm trying to extract data on population and sample level SNPs for the AFR population from a vcf. Data looks like this:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  HG00096 HG00097 HG00099 HG00100 HG00101 HG00102 HG00103 HG00105 HG00106 HG00107 HG00108 HG00109 HG00110 HG00111 HG001
20      60343   .       G       A       100     PASS    AC=1;AF=0.000199681;AN=5008;NS=2504;DP=20377;EAS_AF=0;AMR_AF=0.0014;AFR_AF=0;EUR_AF=0;SAS_AF=0;AA=.|||  GT      0|0     0|0     0|0

I've managed to come up with the following:

for line in open('filename.vcf.gz', 'r'):
    if line [:2] == '##':
    if line[0] == '#':
        data = line.rstrip().split('\t')
        ncol = len(data)
        for i in range(9,ncol):

which should clean up and isolate my data columns, but I'm lost as to how to achieve the rest.

snp python • 1.4k views
ADD COMMENTlink written 4.3 years ago by Cece0

If you don't want to write your own:

ADD REPLYlink written 4.3 years ago by Zev.Kronenberg11k

Thanks, but I'm learning python as well as bioinformatics so I'm working on learning how to write my own scripts.

ADD REPLYlink written 4.3 years ago by Cece0

Good plan. Here is an API that could be helpful:

ADD REPLYlink written 4.3 years ago by Zev.Kronenberg11k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1776 users visited in the last hour