Question: Remove few samples and their related information from dbSNP
0
gravatar for zengtony743
2.6 years ago by
zengtony74370
Canada
zengtony74370 wrote:

Hi, I have questions here, I got a dbSNP (VCF format) for filtering my own variants, However, i do not need SNPs from some samples in dbSNP. I want to remove some samples from the dbSNP. Any tools can help me do this? I tried GATK SelectVariants, it does not work.

1) header of the dbSNP vcf file

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 129P2

129S1 129S5 AJ AKRJ BALBcJ C3HHeJ C57BL6NJ CASTEiJ CBAJ
DBA2J FVBNJ LPJ NODShiLtJ NZOHlLtJ PWKPhJ SPRETEiJ
WSBEiJ
chr10 3100945 . C G 252.17 PASS AC1=0;AC=2;AF1=0;AN=36;D
P4=127,322,1,9;DP=474;MDV=0;MQ=35;MSD=0;PV0=0.37;PV1=1;PV2=0.25;PV3=0.068;PV4=0.
37,1,0.25,0.068;QD=0.0133;SB=0.3611;VDB=0.0253 GT:GQ:DP:SP:PL:FI 0/0:.:16
:0:0,.,.:1 0/0:.:36:0:0,.,.:1 0/0:.:8:0:0,.,.:1 0/0:.:17:0:0,.,.
:1 0/0:.:26:0:0,.,.:1 0/0:.:27:0:0,.,.:1 0/0:.:41:0:0,.,.:1
0/0:.:24:0:0,.,.:1 0/0:.:29:0:0,.,.:1 0/0:.:26:0:0,.,.:1 0/0:.:32
:0:0,.,.:1 0/0:.:33:0:0,.,.:1 0/0:.:31:0:0,.,.:1 0/0:.:25:0:0,.,.

2) I need to keep all the samples but not 129S1, 129S5 and C57BL6NJ

Or I want my final dbSNP file to be like this

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 

 AJ  AKRJ  BALBcJ  C3HHeJ  CASTEiJ  CBAJ  DBA2J  FVBNJ  LPJ  NODShiLtJ  NZOHlLtJ PWKPhJ  SPRETEiJ WSBEiJ
chr10 3100945 . C G 252.17 PASS AC1=0;AC=2;AF1=0;AN=36;D
P4=127,322,1,9;DP=474;MDV=0;MQ=35;MSD=0;PV0=0.37;PV1=1;PV2=0.25;PV3=0.068;PV4=0.
37,1,0.25,0.068;QD=0.0133;SB=0.3611;VDB=0.0253 GT:GQ:DP:SP:PL:FI 0/0:.:16
:0:0,.,.:1 0/0:.:36:0:0,.,.:1 0/0:.:8:0:0,.,.:1 0/0:.:17:0:0,.,.
:1 0/0:.:26:0:0,.,.:1 0/0:.:27:0:0,.,.:1 0/0:.:41:0:0,.,.:1
0/0:.:24:0:0,.,.:1 0/0:.:29:0:0,.,.:1 0/0:.:26:0:0,.,.:1 0/0:.:32
:0:0,.,.:1 0/0:.:33:0:0,.,.:1 0/0:.:31:0:0,.,.:1 0/0:.:25:0:0,.,.

3) i split samples from dbSNP vcf file except for 129S1, 129S5, C57BL6NJ using GATK SelectVariants. It does not work for dbSNP vcf file (SelectVariants tool works for my own VCF file though) the command i used is

$ java -jar GenomeAnalysisTK.jar -R genome.fa -T SelectVariants --variant dbSNP.vcf -o final_dbSNP.vcf -sn AJ -sn AKRJ -sn BALBcJ -sn C3HHeJ -sn CASTEiJ -sn CBAJ -sn DBA2J -sn FVBNJ -sn LPJ -sn NODShiLtJ -sn NZOHlLtJ -sn PWKPhJ -sn SPRETEiJ -sn WSBEiJ &

split snp vcf • 880 views
ADD COMMENTlink modified 2.6 years ago by genomax64k • written 2.6 years ago by zengtony74370
0
gravatar for Zaag
2.6 years ago by
Zaag720
Amsterdam
Zaag720 wrote:

This should remove the samples you want removed:

java -jar GenomeAnalysisTK.jar -R genome.fa -T SelectVariants --variant dbSNP.vcf -o final_dbSNP.vcf -xl_sn 129S1 -xl_sn 129S5  -xl_sn C57BL6NJ
ADD COMMENTlink written 2.6 years ago by Zaag720

Thank you very much, Zaag. That works!

ADD REPLYlink written 2.6 years ago by zengtony74370
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1533 users visited in the last hour