Question: Analyse VCF file to find snps common to all samples in lineage
0
gravatar for samuel.lipworth
3.5 years ago by
University of Oxford
samuel.lipworth30 wrote:

I have a VCF file with 22 different samples in it. Looking at the example below I know that all of the 2s come from the same lineage from looking at my phylo tree. What can I use to query the vcf so that I can group different isolates from the same lineage together and find out when there is a SNP that affects all isolates from that lineage?

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NC000962.fasta.ref  19b0_125_contigs.fa 25c6_159_contigs.fa f0d9_151_contigs.fa 7da4_159_contigs.fa ee25_141_contigs.fa c64b_157_contigs.fa fbd6_161_contigs.fa ed1e_155_contigs.fa 6a63_data_151_contigs.fa    2fec_151_contigs.fa 26af_147_contigs.fa f206_153_contigs.fa e2f1_165_contigs.fa a570_151_contigs.fa df23_151_contigs.fa 1f3c_115_contigs.fa 6213_161_contigs.fa 7ea3_147_contigs.fa 46a56_data_155_contigs.fa   dbc3_159_contigs.fa b1fa_149_contigs.fa

NC_000962   72  GTTCAGGCTT.CACCACAGTG   C   T   40  PASS    NA  GT  1   1   2   1   1   2   1   2   1   1   1   1   1   1   1   1   1   1   1   1   1   1
snp vcf • 1.2k views
ADD COMMENTlink modified 3.3 years ago by Pierre Lindenbaum129k • written 3.5 years ago by samuel.lipworth30

when there is a SNP that affects all isolates from that lineage

Can you perhaps expand this request? I am not sure I understood what you need completely.

ADD REPLYlink written 3.5 years ago by Macspider3.0k

so for example I have 22 bacterial genomes which I know from creating a snp based phylogeny tree cluster into 7 distinct lineages. I can see what these snps are on a core genome alignment multi-fasta file but this is obviously a very slow way to do it so I want to try to get this information from the VCF file which contains the snps from all the samples. Ie somthing a bit like VCFtools --diff-site but working on the columns above which show 1 or 2 depending on whether the sample has the ref necleotide or the snp.

(I can think of a way of doing this with Python but just wanted to know if someone had already done it!)

ADD REPLYlink written 3.5 years ago by samuel.lipworth30

I can't help with standard solutions, if it was me I would personally go with a lot of awk, sed, grep, and all the other brothers of them :)

ADD REPLYlink written 3.5 years ago by Macspider3.0k
0
gravatar for Pierre Lindenbaum
3.3 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum129k wrote:

(no tested) using vcffilterjs and the following script:

var pop1=["S1","S2","S3"];
var pop2=["S4","S5","S6"];


function accept(v)
    {
    var pop2allele=[null,null];
    for(var i=0;i< 2;++i)
        {
        var population = (i==0?pop1:pop2);
        for(var j=0;j< population.length;++j) {
            var g=v.getGenotype(population[j]);
            if(g==null || g.getAlleles().size()!=1) continue;
            var allele=g.getAlleles().get(0);
            if(allele.isNoCall()) continue;
            if(pop2allele[i]==null)
                {
                pop2allele[i]=allele;
                }
            else if(!pop2allele[i].equals(allele))
                {
                return false;
                }
            }
        }
    if(pop2allele[0]==null ||pop2allele[1]==null) return false;
    return !pop2allele[0].equals(pop2allele[1]);
    }
accept(variant);
ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by Pierre Lindenbaum129k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1111 users visited in the last hour