Extracting specific SNPs from vcf file
5
0
Entering edit mode
7.5 years ago

Hello everyone, Please, I am new with bioinfo tools and I ask your help. I did GBS on two plant varieties on which I work. I have the vcf files and I want to identify SNPs that are specific to each of my varieties. Could someone point me to a program or procedure to follow? thank you so much

SNP sequence • 8.4k views
ADD COMMENT
0
Entering edit mode

Thank you all. I will try and keep you informed!

ADD REPLY
0
Entering edit mode

Hello everyone, Thanks for your advice but I still have trouble sorry. Could someone help please? I tried different commands that you advised me "vcf contrast" of vcftools and "subtract" from bedtools but it does not work. Maybe I poorly explained what I want to do. I'll try to explain better. In fact I have the GBS results of my two "varieties" of plants in a single vcf file containing the SNP position on the contigs (the reference sequence is partial), genotypes and other sequencing information for all the samples. What I want is to know if some SNPs are specific to either of my two "varieties". If so, I want to identify these SNPs and extract them from the vcf file. Thanks a lot for the help.

ADD REPLY
0
Entering edit mode

If you are able to program in Python there's a package called PyVCF, which can parse the VCF for you and giving you easy access to the genotypes for each sample. Then it would be a matter of simply filtering out those positions that are the same across all samples while keeping those that are different, and doing whatever downstream analysis you want on those.

ADD REPLY
1
Entering edit mode
ADD COMMENT
1
Entering edit mode
7.5 years ago

Lots of filtering options are available using GATK SelectVariants:

ADD COMMENT
1
Entering edit mode
7.5 years ago
sacha ★ 2.4k

I use variant-tools which is greate if you want to perform set operation. for instance : "Select all SNP which is not in A but in B" .

ADD COMMENT
0
Entering edit mode

Hey! Can you please post an example command?

ADD REPLY
1
Entering edit mode
7.5 years ago

I would also go for bedtools subtract, but if you have a single multisample vcf you first have to split it and then find the private sites:

bcftools view -m2 --samples sample1 multisample.vcf > sample1.vcf
bcftools view -m2 --samples sample2 multisample.vcf > sample2.vcf
bedtools subtract -a sample1.vcf -b sample2.vcf > sample1.private.vcf
bedtools subtract -a sample2.vcf -b sample1.vcf > sample2.private.vcf
ADD COMMENT
0
Entering edit mode

Thanks Sorry I'm not able to program in Python. I have already split my vcf file with "the commande cut" before using bedtools "subtract" but he return the following error:
commande typed: bedtools subtract -a egusi.vcf -b cal.vcf * ERROR: too many digits/characters for integer conversion in string . Exiting...

may be it is because file was split with "cut"? I will done as advised Amigo and let you know.

Thanks you so very much!

ADD REPLY
0
Entering edit mode

You are making this thread very confusing by replying to the wrong answer with this comment, or addressing two answers simultaneously...

ADD REPLY
0
Entering edit mode
7.5 years ago

Bedtools 'subtract' is another option.

ADD COMMENT

Login before adding your answer.

Traffic: 2758 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6