Question: Extracting specific SNPs from vcf file
0
gravatar for konanamenanjacky
2.4 years ago by
konanamenanjacky0 wrote:

Hello everyone, Please, I am new with bioinfo tools and I ask your help. I did GBS on two plant varieties on which I work. I have the vcf files and I want to identify SNPs that are specific to each of my varieties. Could someone point me to a program or procedure to follow? thank you so much

snp sequence • 1.6k views
ADD COMMENTlink modified 2.4 years ago by Jorge Amigo11k • written 2.4 years ago by konanamenanjacky0

Thank you all. I will try and keep you informed!

ADD REPLYlink written 2.4 years ago by konanamenanjacky0

Hello everyone, Thanks for your advice but I still have trouble sorry. Could someone help please? I tried different commands that you advised me "vcf contrast" of vcftools and "subtract" from bedtools but it does not work. Maybe I poorly explained what I want to do. I'll try to explain better. In fact I have the GBS results of my two "varieties" of plants in a single vcf file containing the SNP position on the contigs (the reference sequence is partial), genotypes and other sequencing information for all the samples. What I want is to know if some SNPs are specific to either of my two "varieties". If so, I want to identify these SNPs and extract them from the vcf file. Thanks a lot for the help.

ADD REPLYlink written 2.4 years ago by konanamenanjacky0

If you are able to program in Python there's a package called PyVCF, which can parse the VCF for you and giving you easy access to the genotypes for each sample. Then it would be a matter of simply filtering out those positions that are the same across all samples while keeping those that are different, and doing whatever downstream analysis you want on those.

ADD REPLYlink written 2.4 years ago by erikfas20
1
gravatar for Jenez
2.4 years ago by
Jenez520
Sweden
Jenez520 wrote:

vcftools would probably fulfil your needs.

ADD COMMENTlink written 2.4 years ago by Jenez520
1
gravatar for WouterDeCoster
2.4 years ago by
Belgium
WouterDeCoster37k wrote:

Lots of filtering options are available using GATK SelectVariants:

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by WouterDeCoster37k
1
gravatar for sacha
2.4 years ago by
sacha1.7k
France
sacha1.7k wrote:

I use variant-tools which is greate if you want to perform set operation. for instance : "Select all SNP which is not in A but in B" .

ADD COMMENTlink written 2.4 years ago by sacha1.7k

Hey! Can you please post an example command?

ADD REPLYlink written 5 months ago by shubhra.bhattacharya120
1
gravatar for Jorge Amigo
2.4 years ago by
Jorge Amigo11k
Santiago de Compostela, Spain
Jorge Amigo11k wrote:

I would also go for bedtools subtract, but if you have a single multisample vcf you first have to split it and then find the private sites:

bcftools view -m2 --samples sample1 multisample.vcf > sample1.vcf
bcftools view -m2 --samples sample2 multisample.vcf > sample2.vcf
bedtools subtract -a sample1.vcf -b sample2.vcf > sample1.private.vcf
bedtools subtract -a sample2.vcf -b sample1.vcf > sample2.private.vcf
ADD COMMENTlink written 2.4 years ago by Jorge Amigo11k

Thanks Sorry I'm not able to program in Python. I have already split my vcf file with "the commande cut" before using bedtools "subtract" but he return the following error:
commande typed: bedtools subtract -a egusi.vcf -b cal.vcf * ERROR: too many digits/characters for integer conversion in string . Exiting...

may be it is because file was split with "cut"? I will done as advised Amigo and let you know.

Thanks you so very much!

ADD REPLYlink written 2.4 years ago by konanamenanjacky0

You are making this thread very confusing by replying to the wrong answer with this comment, or addressing two answers simultaneously...

ADD REPLYlink written 2.4 years ago by WouterDeCoster37k
0
gravatar for harold.smith.tarheel
2.4 years ago by
United States
harold.smith.tarheel4.3k wrote:

Bedtools 'subtract' is another option.

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by harold.smith.tarheel4.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1373 users visited in the last hour