Question: Common SNPs among all populations
0
gravatar for mostafarafiepour
4 months ago by
mostafarafiepour60 wrote:

Hi all Dear,

I have a VCF file that contains approximately 20 millions SNPs. And my VCF file has 5 populations (KHUZ, MAZ, WAZ, EAZ and GIL), each population has 10 samples. So, I want to know how many SNPs are common between the 5 populations?

Best Regard

snp vcf • 436 views
ADD COMMENTlink modified 4 months ago by zx87547.1k • written 4 months ago by mostafarafiepour60

Take a look at VCFtools

Usage guide

ADD REPLYlink modified 4 months ago • written 4 months ago by Bastien Hervé4.0k

Many thanks for your reply,

I saw the page you shared. And the only thing I can find is the option --exclude-positions-overlap. Can you guide me more precisely?

ADD REPLYlink written 4 months ago by mostafarafiepour60

In the first link you have different tools, as vcf-isec

Creates intersections and complements of two or more VCF files. Given multiple VCF files, it can output the list of positions which are shared by at least N files, at most N files, exactly N files, etc

See also : Obtain one vcf file of shared SNPs from input files with different samples using vcf-isec (vcftools)

ADD REPLYlink written 4 months ago by Bastien Hervé4.0k
1
gravatar for finswimmer
4 months ago by
finswimmer11k
Germany
finswimmer11k wrote:

Assuming the Samples 1-10 are from population 1, 11-20 from population 2 and 21-30 from population 3, you can filter for variant sites, where at least one sample in each population has a non-reference allel, using bcftools view like this:

$ bcftools view -i 'GT[0-9]="alt" && GT[10-19]="alt" && GT[20-29]="alt"' input.vcf

fin swimmer

ADD COMMENTlink modified 4 months ago • written 4 months ago by finswimmer11k

I run your script, but I encountered this error.

[filter.c:1379 filters_init1] Error: the tag "INFO/GT[1" is not defined in the VCF header

this is header my vcf file:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  EAZ01   EAZ02   EAZ03   EAZ04   EAZ06   EAZ08   EAZ20   EAZ21   EAZ25   EAZ26   GIL01   GIL03   GIL04   GIL05   GIL06   GIL08   GIL11 GIL14    GIL15   GIL17   KHU01   KHU02   KHU04   KHU05   KHU06   KHU07   KHU09   KHU12   KHU15   KHU17   MAZ01   MAZ02   MAZ03   MAZ04   MAZ05   MAZ08   MAZ10   MAZ12   MAZ13   MAZ19   WAZ01   WAZ02   WAZ03 WAZ04    WAZ05   WAZ06   WAZ07   WAZ08   WAZ12   WAZ16
ADD REPLYlink written 4 months ago by mostafarafiepour60

Which version are you using? The syntax I described was introduced in v1.8. Current release is v1.9.

ADD REPLYlink written 4 months ago by finswimmer11k

I use bcftools/1.3.1 version. I only have this version on the server.

ADD REPLYlink written 4 months ago by mostafarafiepour60
2

Have a look at bioconda. This enables you to install the newest version (and a lot of other bioinformatic tools) even without root permissions. Also the first part of this tutorial I'v written might be useful for you.

fin swimmer

ADD REPLYlink written 4 months ago by finswimmer11k

I installed the suggested version, but when I run the script, it does not give me the output. And it only displays the content of my VCF file in a graphical and fast way. My expectation is that when i run the script, i will give the number of common PSNs between 5 populations.

i code run:

bcftools view -i 'GT[1-10]="alt" && GT[11-20]="alt" && GT[21-30]="alt" && GT[31-40]="alt" && GT[41-50]="alt"' Final.vcf
ADD REPLYlink written 4 months ago by mostafarafiepour60
1

What you see is the filtered vcf. One could save this output to a new file using > file.vcf at the end of the command.

If you don't want this and you are only interested in the number of variants you can count the lines excluding the header lines.

$ bcftools view -i 'GT[0-9]="alt" && GT[10-19]="alt" && GT[20-29]="alt" && GT[30-39]="alt" && GT[40-50]="alt"' Final.vcf|grep -v "^#"|wc -l

fin swimmer

PS: Be aware of the counting of the sample numbers. It's starting from 0!

ADD REPLYlink modified 4 months ago • written 4 months ago by finswimmer11k

This code works well but with a small change.

$ bcftools view -i 'GT[0-9]="alt" && GT[10-19]="alt" && GT[20-29]="alt" && GT[30-39]="alt" && GT[40-49]="alt"' Final.vcf|grep -v "^#"|wc -l
ADD REPLYlink modified 4 months ago • written 4 months ago by mostafarafiepour60
0
gravatar for Pierre Lindenbaum
4 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum119k wrote:

using vcffilterjdk http://lindenb.github.io/jvarkit/VcfFilterJdk.html

an example with 3 populations

select.code:

final String POP1[]={"S1"};
final String POP2[]={"S2","S3"};
final String POP3[]={"S4","S5"};

final String ALL_POPS[][]={POP1,POP2,POP3};

for(String pop[]:ALL_POPS) {
    if(Arrays.stream(pop).
        map(SAMPLE->variant.getGenotype(SAMPLE)).
        noneMatch(G->G.isHet() || G.isHomVar())) return false;
    }
return true;

usage:

java -jar dist/vcffilterjdk.jar -f select.code input.vcf
ADD COMMENTlink modified 4 months ago • written 4 months ago by Pierre Lindenbaum119k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 810 users visited in the last hour