finding specific SNPs in VCF files
3
0
Entering edit mode
3.1 years ago
Wilber0x ▴ 50

I have a list of SNPs from one organism (organism A). I have genome skims of hybrids between organism A and other species. What methods can I use to see if SNPs from organism A are present in the hybrids?

I have used bowtie2 to map genome skims to the same reference I used when finding SNPs for organism A.

snp alignment gene next-gen genome • 1.9k views
ADD COMMENT
0
Entering edit mode
3.1 years ago
2nelly ▴ 310

Hi Wilber0x,

To make a long story short (please correct me if I am wrong) you have:

1) a vcf file from hybrids

2) a list of SNPs from an organism A ( what kind of format?)

You can annotate your vcf file using tools like:

http://snpeff.sourceforge.net/SnpSift.html#annotate

 java -jar SnpSift.jar annotate dbSnp132.vcf variants.vcf > variants_annotated.vcf

there is no need to use a file in vcf format for annotation (dbSnp132.vcf in the example)

Just replace this with a text file having at least those info below

chr1    3000020 rs1133275841    T   A
chr1    3000023 rs1133585662    C   A
chr1    3000126 rs580370473 G   T
chr1    3000185 rs585444580 G   T
chr1    3000234 rs579469519 G   A
chr1    3000258 rs582985490 G   T
chr1    3000259 rs586234354 T   G
chr1    3000280 rs580430667 C   T
chr1    3000281 rs584188706 A   G
chr1    3000287 rs587313017 A   G
chr1    3000315 rs581375106 T   G
chr1    3000321 rs583231582 G   T

The ID field in your vcf file will get the rs code if the SNP is present

The other option is to intersect the coordinates of vcf with those of the list. However, I don t recommend this, since maybe you have to slightly modify the files

Good luck

ADD COMMENT
0
Entering edit mode

Thank you. SNPs from organism A are also in a vcf file.

ADD REPLY
0
Entering edit mode

that s fine, you can use it as it is

ADD REPLY
0
Entering edit mode
3.1 years ago
Ace ▴ 90

If you have a VCF file, vcftools' positions overlap function should work, specifically:

vcftools --vcf $vcf --positions-overlap $list --kept-sites --out $out

would give you a file ${out}.kept.sites that would include all the sites in your VCF that are present in your list of snps. You could use the --counts function instead of kept sites if you wanted to know how many of your samples had one allele vs the other.

ADD COMMENT
0
Entering edit mode
3.1 years ago
Brice Sarver ★ 3.7k

bedtools and bcftools are the standard toolkits for this kind of question. bedtools intersect will give you what you want, as will bcftools isec.

ADD COMMENT

Login before adding your answer.

Traffic: 1359 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6