Question: finding specific SNPs in VCF files
0
gravatar for Wilber0x
7 days ago by
Wilber0x10
Wilber0x10 wrote:

I have a list of SNPs from one organism (organism A). I have genome skims of hybrids between organism A and other species. What methods can I use to see if SNPs from organism A are present in the hybrids?

I have used bowtie2 to map genome skims to the same reference I used when finding SNPs for organism A.

alignment snp next-gen genome gene • 111 views
ADD COMMENTlink modified 7 days ago by Brice Sarver2.8k • written 7 days ago by Wilber0x10
0
gravatar for 2nelly
7 days ago by
2nelly170
Geneva,Switzerland
2nelly170 wrote:

Hi Wilber0x,

To make a long story short (please correct me if I am wrong) you have:

1) a vcf file from hybrids

2) a list of SNPs from an organism A ( what kind of format?)

You can annotate your vcf file using tools like:

http://snpeff.sourceforge.net/SnpSift.html#annotate

 java -jar SnpSift.jar annotate dbSnp132.vcf variants.vcf > variants_annotated.vcf

there is no need to use a file in vcf format for annotation (dbSnp132.vcf in the example)

Just replace this with a text file having at least those info below

chr1    3000020 rs1133275841    T   A
chr1    3000023 rs1133585662    C   A
chr1    3000126 rs580370473 G   T
chr1    3000185 rs585444580 G   T
chr1    3000234 rs579469519 G   A
chr1    3000258 rs582985490 G   T
chr1    3000259 rs586234354 T   G
chr1    3000280 rs580430667 C   T
chr1    3000281 rs584188706 A   G
chr1    3000287 rs587313017 A   G
chr1    3000315 rs581375106 T   G
chr1    3000321 rs583231582 G   T

The ID field in your vcf file will get the rs code if the SNP is present

The other option is to intersect the coordinates of vcf with those of the list. However, I don t recommend this, since maybe you have to slightly modify the files

Good luck

ADD COMMENTlink modified 7 days ago • written 7 days ago by 2nelly170

Thank you. SNPs from organism A are also in a vcf file.

ADD REPLYlink written 7 days ago by Wilber0x10

that s fine, you can use it as it is

ADD REPLYlink written 7 days ago by 2nelly170
0
gravatar for Ace
7 days ago by
Ace60
Ace60 wrote:

If you have a VCF file, vcftools' positions overlap function should work, specifically:

vcftools --vcf $vcf --positions-overlap $list --kept-sites --out $out

would give you a file ${out}.kept.sites that would include all the sites in your VCF that are present in your list of snps. You could use the --counts function instead of kept sites if you wanted to know how many of your samples had one allele vs the other.

ADD COMMENTlink written 7 days ago by Ace60
0
gravatar for Brice Sarver
7 days ago by
Brice Sarver2.8k
United States
Brice Sarver2.8k wrote:

bedtools and bcftools are the standard toolkits for this kind of question. bedtools intersect will give you what you want, as will bcftools isec.

ADD COMMENTlink modified 7 days ago • written 7 days ago by Brice Sarver2.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1189 users visited in the last hour