bcftools split-vep -- how to split the INFO column up and also assign the ensembl vep headers
1
0
Entering edit mode
19 months ago
amy__ ▴ 160

Hello,

I have seen a few answers to this but none seem to do what I would like.

I have an annotated vcf file which has the ensembl headers like this:

##VEP="v107" time="2022-09-12 19:16:50" cache="/home/c.c21087028/.vep/homo_sapiens/107_GRCh38" ensembl-io=107.a473894 ensembl-funcgen=107.0fbd7d5 ensembl=107.5f39899 ensembl-variation=107.db634f2 1000genomes="phase3" COSMIC="95" ClinVar="202201" HGMD-PUBLIC="20204" assembly="GRCh38.p13" dbSNP="154" gencode="GENCODE 41" genebuild="2014-07" gnomADe="r2.1.1" gnomADg="v3.1.2" polyphen="2.2.2" regbuild="1.0" sift="sift5.2.2"
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND|FLAGS|VARIANT_CLASS|SYMBOL_SOURCE|HGNC_ID|CANONICAL|MANE_SELECT|MANE_PLUS_CLINICAL|TSL|APPRIS|CCDS|ENSP|SWISSPROT|TREMBL|UNIPARC|UNIPROT_ISOFORM|GENE_PHENO|SIFT|PolyPhen|DOMAINS|miRNA|HGVS_OFFSET|AF|AFR_AF|AMR_AF|EAS_AF|EUR_AF|SAS_AF|gnomADe_AF|gnomADe_AFR_AF|gnomADe_AMR_AF|gnomADe_ASJ_AF|gnomADe_EAS_AF|gnomADe_FIN_AF|gnomADe_NFE_AF|gnomADe_OTH_AF|gnomADe_SAS_AF|gnomADg_AF|gnomADg_AFR_AF|gnomADg_AMI_AF|gnomADg_AMR_AF|gnomADg_ASJ_AF|gnomADg_EAS_AF|gnomADg_FIN_AF|gnomADg_MID_AF|gnomADg_NFE_AF|gnomADg_OTH_AF|gnomADg_SAS_AF|MAX_AF|MAX_AF_POPS|CLIN_SIG|SOMATIC|PHENO|PUBMED|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE|TRANSCRIPTION_FACTORS|CADD_PHRED|CADD_RAW">
##CADD_PHRED=PHRED-like scaled CADD score
##CADD_RAW=Raw CADD score

I was wondering if it is possible to split these in the INFO field and also to assign the above header to the correct column.

I have tried this:

echo -e "CHROM\tPOS\tREF\tALT\t$(bcftools +split-vep -l input.vcf | cut -f 2 | tr '\n' '\t' | sed 's/\t$//')" > output.tsv
bcftools +split-vep -f '%CHROM\t%POS\t%REF\t%ALT\t%CSQ\n' -d -A tab input.vcf >> output.tsv

but it does not put the headers on, and also misses some of the above off the output.

Thanks, I hope this makes sense. Amy

bcftools ensembl-vep vep • 1.7k views
ADD COMMENT
3
Entering edit mode
19 months ago

It would help to show an example of the expected output. If it helps, recently I used this command to convert a vep-annotated vcf to TSV:

bcftools +split-vep -d -f '%CHROM %POS %ID %REF %ALT %QUAL %TYPE [%AD{0}] [%AD{1}] [%ALT_AF] [%SUM_ALT_AF] %SYMBOL %Gene %Feature %BIOTYPE %Consequence %IMPACT %Amino_acids %Codons\n' input.vcf > out.tsv

The full command which also adds the header line and makes the output in "long" format is here.

ADD COMMENT
0
Entering edit mode

That worked great thanks!

I thought I'd also add another answer I found that worked too:

bcftools +split-vep input.vcf -f '%ID\t%CHROM\t%POS\t%REF\t%ALT\t%CSQ\n' -d  -A tab  > output.vcf

Although this one didn't add the headers in after, but I might just do that as a second step after with another bash command.

Thanks!! Amy

ADD REPLY
0
Entering edit mode

If anyone wants to know how to keep the FORMAT column and also split that into columns you can use:

bcftools +split-vep input.vcf -f '\t%CHROM\t%ID\t%POS\t%REF\t%ALT\t%CSQ[\t%GT][\t%GQ][\t%DP][\t%MIN_DP][\t%AD][\t%VAF][\t%PL][\t%MED_DP]\n' -d -A tab > output.vcf
ADD REPLY

Login before adding your answer.

Traffic: 2050 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6