file conversion: vcf file to allele frequency file
1
0
Entering edit mode
8.4 years ago
krp0001 ▴ 40

Dear Users

I am analysing genome wide scan for selective sweeps using (SF2) Sweepfinder2, I appreciate if you could help me with the file conversion from VCF to desired allele frequency file, that is specific to the SF2, something that looks like described in the manual.

position  x     n     folded
460000   9   100    0
460010 100 100     0
460210   30  78      1
463000   0    94     0

The first column is the position on the chromosome, the second column is the allele count ( ), the third column is the sample size ( ), and the fourth column is an indicator as to whether the site has been polarized (i.e., whether it is known that the allele is derived or ancestral).

Thanks in advance

SNP sequence genome • 3.1k views
ADD COMMENT
0
Entering edit mode

I gotta say, the tool looks a bit poorly designed. If they need custom input, they should ideally provide the tool that converts standardized VCF to their custom format, or build in a handy parser so the tool can accept a VCF file. Really strange break in philosophy there.

ADD REPLY
0
Entering edit mode
8.4 years ago
Ram 43k

To being with, the tool just mentions "allele frequency at a particular location" per line. There are (at least) two alleles at each variant locus, so that would be >=2 lines per variant, unless they are expecting only the minor/least frequent variant to be mentioned.

Alright, I'm assuming you have a single VCF file here. You can use vcftools on the vcf file to obtain a frequency output file that should give you per-locus allele frequencies. Given the number of samples in the VCF file is constant, you can use awk with the freq file to create the custom input file for SF2.

You should write to the authors and highlight this issue - and offer a solution. I am sure they will appreciate it.

ADD COMMENT
0
Entering edit mode

Hai Ram,

Thank you for your comments, I have written to them asking for the tool that converts, asked to do "some scripting to convert your data", which I am not that good at.

As you suggested, i have converted VCF file to frequency file in vcftools, below is the few lines, could you please share if you have any customised script for the desired SF2 format.

Than you in advance.

CHROM    POS    N_ALLELES    N_CHR    {ALLELE:FREQ}
scaffold1    382    2    16    T:0.625    A:0.375
scaffold1    385    2    16    T:0.5625    A:0.4375
scaffold1    386    2    16    G:0.5    C:0.5
scaffold1    446    2    14    C:0.928571    T:0.0714286
scaffold1    460    2    14    C:0.928571    T:0.0714286
scaffold1    534    2    14    C:0.928571    T:0.0714286
scaffold1    779    2    13    G:0.923077    A:0.0769231
scaffold1    783    2    13    G:0.923077    A:0.0769231
scaffold1    828    2    14    C:0.928571    T:0.0714286
scaffold1    918    2    14    G:0.928571    A:0.0714286
scaffold1    922    2    14    G:0.928571    A:0.0714286
scaffold1    929    2    14    C:0.928571    T:0.0714286
scaffold1    943    2    14    G:0.928571    C:0.0714286
ADD REPLY
0
Entering edit mode

Like I said, there is more to this than scripting. Can they define "allele count" better? Do they mean per allele allele count? That would be >=2 entries per variant. Do they mean the major or minor allele's allele count? Unless we have more context, we cannot extract relevant data.

EDIT: It looks like one needs quite a bit of domain knowledge to understand the mechanics of the data. I am not good with monomorphic sites or derived alleles. If you can define those for me, I could probably help you out with the script.

ADD REPLY

Login before adding your answer.

Traffic: 2093 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6