How to safely string split GT into it's own column in a vcf file by separating with delimiter ':'?
1
0
Entering edit mode
3 months ago
Vanish007 ▴ 40

Hi,

I have a multi-sample vcf with the following information for a few thousand samples:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO                                        FORMAT                                                  S001                                            S002
  1    808922  .        G       A     20035.8   PASS    ExcessHet=3.0103;FS=0;MQ=39.81;QD=31.14     GT:AD:DP:GQ:PL:SDP:RD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 1/1:0,147:147:99:5958,454,0:.:.:.:.:.:.:.:.:.:. 1/1:0,163:163:99:7381,503,0:.:.:.:.:.:.:.:.:.:. 

I think the information that follows GT is throwing off some downstream bioinformatics analysis tools (especially AD with the comma). Is there a safe way to separate GT from other information? I could possibly figure out a way with library stringr and "strsplit" in R, but I was hoping for a faster and safer method through "bcftools view" or in command line.

Thanks!

bcftools delimiter vcf linux split. string • 174 views
ADD COMMENT
3
Entering edit mode
3 months ago

I think the information that follows GT is throwing off some downstream bioinformatics analysis tools (especially AD with the comma)

AD with comma is valid, you should have a look at those downstream bioinformatics analysis tools

Is there a safe way to separate GT from other information?

bcftools annotate -x '^FORMAT/GT' in.vcf

ADD COMMENT

Login before adding your answer.

Traffic: 1280 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6