How can I convert a tab delemeted file to vcf format using vcf-annotate tool?
Entering edit mode
8.9 years ago
ivivek_ngs ★ 5.2k

I am having a tab delimeted variant file which I created using the filters out false positives from varscan output vcf file). Now the file is changed to a non vcf format file. So I was trying to use the vcf-annotate tool to make the fpfilter_output file to vcf format file. But I am unable to do so? The format of the file which I want to change back to vcf is below.

chrM    152    T    C    258    0.7829    0.2171    PASS    .
chrM    13811    G    A    397    0.8237    0.1763    PASS    .
chr1    1669566    C    T    61    0.7705    0.2295    PASS    .
chr1    1685929    A    C    177    0.8136    0.1864    PASS    .
chr1    2629228    T    G    10    0.6000    0.4000    PASS    .

According to the fpfilter github comment this output file can be converted to vcf format with vcf-annotate but I could not find any example of non vcf file being converted to vcf using vcf-annotate. Can anyone tell me how to do that. I want to create the vcf file now keeping only the variants that has FILTER=PASS using vcf annotate. I would like some assistance.


vcf vcftools snp tcga • 6.9k views
Entering edit mode
8.9 years ago

The fpfilter comment says "The output file is tab-delimited and contains a column with a filter tag that can be written back into a VCF using vcf-annotate." This means that you can use that file as a source of annotation to write back into the original VCF files that you started off with. See the documentation for vcf-annotate, and you can come up with a command that looks something like this:

cat S_313_T_soma_snvs.vcf | vcf-annotate -a S_313_T_soma_snvs.fpfilter.gz -d ... -c ... > S_313_T_soma_snvs.fpfilter.vcf
Entering edit mode

Am having trouble with this annotation. Earlier I did not require it but now am trying to annotate the output of fpfilter back to original VCF file but somehow it is not being written properly.

My fpfilter out looks like this

chr1    1267350    C    A    69    0.9420    0.0580    PASS    .
chr1    1289256    C    A    85    0.9529    0.0471    Strandedness    Ref=0.83,Var=1.00,MinMax=[0.01,0.99]
chr1    1424646    C    A    42    0.9048    0.0952    Strandedness    Ref=0.82,Var=1.00,MinMax=[0.01,0.99]
chr1    1886772    C    A    43    0.9302    0.0698    Strandedness    Ref=0.85,Var=1.00,MinMax=[0.01,0.99]
chr1    3044509    G    T    63    0.9048    0.0952    Strandedness    Ref=0.91,Var=1.00,MinMax=[0.01,0.99]

I used the commands below to zip and annotate it

bzip S_313_T_mutect_snvs.fpfilter
tabix -s 1 -b 2 -e 2 S_313_T_mutect_snvs.fpfilter.gz

The original VCF file looks like below . Am not giving the first 124 lines of VCF format below

#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    IPS_S7996    N_S8981
chr1    1563396    .    C    A    .    PASS    SOMATIC;VT=SNP    GT:AD:BQ:DP:FA:SS    0/1:94,4:35:98:0.041:2    0:124,0:.:124:0.00:0
chr1    1686753    .    C    A    .    PASS    SOMATIC;VT=SNP    GT:AD:BQ:DP:FA:SS    0/1:157,6:36:164:0.037:2    0:329,0:.:329:0.00:0
chr1    3712349    .    C    A    .    PASS    SOMATIC;VT=SNP    GT:AD:BQ:DP:FA:SS    0/1:34,4:35:38:0.105:2    0:44,0:.:44:0.00:0
chr1    7309616    .    G    T    .    PASS    SOMATIC;VT=SNP    GT:AD:BQ:DP:FA:SS    0/1:145,6:35:151:0.040:2    0:294,0:.:294:0.00:0
chr1    8390429    .    G    T    .    PASS    SOMATIC;VT=SNP    GT:AD:BQ:DP:FA:SS    0/1:89,4:34:93:0.043:2    0:158,0:.:158:0.00:0

I'm using the command

cat mutect_S_313soma_t_ex_flt.vcf | \
  vcf-annotate -a S_313_T_mutect_snvs.fpfilter.gz \
  -d key=INFO,ID=ANN,Number=1,Type=String,Description='FP filter annotation' \
  -c FILTER,INFO/ANN > S_313_T_mutect_snvs.fpfilter.vcf

But this does not give me the way I want. The -c handler should be the column of the annotation file which is in in zipped format but it is not writing in the desired column. Am getting confused. Am I doing wrong in the tabix command. Since the fpfilter output is not having TO column so the tabix column should be fine, it should have the same column for POS from fpfilter output. Then where am I getting wrong. I want to write the column FILTER of fpfilter out having "PASS" in the original VCF in the INFO key adding a ID "ANN" . How will I modify the command. I tried different ways but to no avail. Any help would be appreciated. May be its naive but somehow am not being able to figure it out.

Entering edit mode
8.6 years ago
ivivek_ngs ★ 5.2k

I found a way to implement this with the fpfilter output but this is not directly with the output file of the fpfilter out. I just selected the columns of chr pos and filter from the tab delimeted file. bgzipped it and then indexed , now I run the vcf-annotate to get in the FILTER column of the original VCF file the flags of the FILTER column present in the tab delimited file. Below is the way how I did it. It is not entirely perfect since the description to me is wrong but however works out as of now.

awk -v OFS='\t' '{print $1, $2, $8}' S_313_T_mutect_snvs.fpfilter > S_313_T_mutect_snvs.fpfilter_tab


chr1    1267350    PASS
chr1    1289256    Strandedness
chr1    1424646    Strandedness
chr1    1886772    Strandedness
chr1    3044509    Strandedness
chr1    3424505    VarFrac
chr1    5927342    Strandedness
chr1    5948588    Strandedness
chr1    7847477    PASS

Now I compress the above file and index it

bgzip S_313_T_mutect_snvs.fpfilter_tab
tabix -s 1 -b 2 -e 2 S_313_T_mutect_snvs.fpfilter_tab.gz

Now running the vcf-annotate on the original vcf file


cat /scratch/GT/vdas/pietro/exome_seq/results/mutect/exonic_call/mutect_S_313soma_t_ex_flt.vcf \
    | /scratch/GT/softwares/vcftools_0.1.12b/bin/vcf-annotate -a S_313_T_mutect_snvs.fpfilter_tab.gz \
            -d key=INFO,ID=ANN,Number=1,Type=String,Description='FP filter annotation' \
            -c CHROM,POS,FILTER > S_313_T_mutect_snvs.fpfilter.vcf


##INFO=<ID=ANN,Number=1,Type=String,Description="FP filter annotation">
##source_20150127.1=vcf-annotate(r731) -a S_313_T_mutect_snvs.fpfilter_tab.gz -d key=INFO,ID=ANN,Number=1,Type=String,Description=FP filter annotation -c CHROM,POS,FILTER
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    T_S7998    N_S8980
chr1    1267350    .    C    A    .    PASS    SOMATIC;VT=SNP    GT:AD:BQ:DP:FA:SS    0/1:66,4:34:70:0.057:2    0:31,0:.:31:0.00:0
chr1    1289256    .    C    A    .    Strandedness    SOMATIC;VT=SNP    GT:AD:BQ:DP:FA:SS    0/1:81,4:34:85:0.047:2    0:82,0:.:82:0.00:0
chr1    1424646    .    C    A    .    Strandedness    SOMATIC;VT=SNP    GT:AD:BQ:DP:FA:SS    0/1:37,4:34:41:0.098:2    0:20,0:.:20:0.00:0
chr1    1886772    .    C    A    .    Strandedness    SOMATIC;VT=SNP    GT:AD:BQ:DP:FA:SS    0/1:39,3:33:42:0.071:2    0:32,0:.:32:0.00:0
chr1    3044509    .    G    T    .    Strandedness    SOMATIC;VT=SNP    GT:AD:BQ:DP:FA:SS    0/1:57,6:36:63:0.095:2    0:73,0:.:73:0.00:0
chr1    3424505    .    G    T    .    VarFrac    SOMATIC;VT=SNP    GT:AD:BQ:DP:FA:SS    0/1:91,4:35:95:0.042:2    0:108,0:.:108:0.00:0
chr1    5927342    .    C    A    .    Strandedness    SOMATIC;VT=SNP    GT:AD:BQ:DP:FA:SS    0/1:81,4:35:85:0.047:2    0:77,0:.:78:0.00:0
chr1    5948588    .    C    A    .    Strandedness    SOMATIC;VT=SNP    GT:AD:BQ:DP:FA:SS    0/1:60,4:35:64:0.063:2    0:64,0:.:64:0.00:0
chr1    7847477    .    G    A    .    PASS    SOMATIC;VT=SNP    GT:AD:BQ:DP:FA:SS    0/1:60,4:37:64:0.063:2    0:166,0:.:166:0.00:0

However I do not agree with the vcf-annotate command I used is proper since the -d (for description) am providing is for INFO field and writes in the original vcf in the FILTER column from the tab delimited file. But still it is working.


Login before adding your answer.

Traffic: 2211 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6