Adding custom annotation to VEP output VCF
0
0
Entering edit mode
7 months ago
avelarbio46 ▴ 30

Hello everyone! I have a multisample VCF, anottated with VEP I wanted to know how many samples had each type of genotype. For that, I used:

paste <(bcftools view "$MYVCF" |\
    awk -F"\t" 'BEGIN {print "#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO"} \
      !/^#/ {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8}') \
    \
  <(bcftools query -f '[\t%SAMPLE=%GT]\n' "$MYVCF" |\
    awk 'BEGIN {print "nHet"} {print gsub(/0\|1|1\|0|0\/1|1\/0/, "")}') \
    \
  <(bcftools query -f '[\t%SAMPLE=%GT]\n' "$MYVCF" |\
    awk 'BEGIN {print "nHomAlt"} {print gsub(/1\|1|1\/1/, "")}') \
    \
  <(bcftools query -f '[\t%SAMPLE=%GT]\n' "$MYVCF" |\
    awk 'BEGIN {print "nHomRef"} {print gsub(/0\|0|0\/0/, "")}') \
    \
  <(bcftools view "$MYVCF" | awk -F"\t" '/^#CHROM/ {split($0, header, "\t"); print "HetSamples"} \
    !/^#CHROM/ {for (i=10; i<=NF; i++) {if (gsub(/0\|1|1\|0|0\/1|1\/0/, "", $(i))==1) {printf header[i]","}; if (i==NF) {printf "\n"}}}') \
    \
  <(bcftools view "$MYVCF" | awk -F"\t" '/^#CHROM/ {split($0, header, "\t"); print "HomSamplesAlt"} \
    !/^#CHROM/ {for (i=10; i<=NF; i++) {if (gsub(/1\|1|1\/1/, "", $(i))==1) {printf header[i]","}; if (i==NF) {printf "\n"}}}') \
    \
  <(bcftools view "$MYVCF" | awk -F"\t" '/^#CHROM/ {split($0, header, "\t"); print "HomSamplesRef"} \
    !/^#CHROM/ {for (i=10; i<=NF; i++) {if (gsub(/0\|0|0\/0/,"", $(i))==1) {printf header[i]","}; if (i==NF) {printf "\n"}}}') \
    \
  | sed 's/,\t/\t/g' | sed 's/,$//g' >> output.vcf

This makes a tab file like:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO  nHet  nHomAlt nHomRef HetSamples  HomSamplesAlt   HomSamplesRef

But then, I need to expand the CSQ field. I was using, for the annotated vcf, this snippet:

echo -e "CHROM\tPOS\tREF\tALT\t$(bcftools +split-vep -l "$MYVCF" f | cut -f 2 | tr '\n' '\t' | sed 's/\t$//')" >output.tsv

bcftools +split-vep -f '%CHROM\t%POS\t%REF\t%ALT\t%CSQ\n' -d -A tab $MYVCF" >> output.tsv

But then, using the first output.vcf (with genotyping counts) and trying to expand to a tab format does not work.

I was thinking that maybe the easiest way would be to include in the original vep annotated vcf the genotype information I just created. Is it possible?

bcftools VEP vcf • 307 views
ADD COMMENT

Login before adding your answer.

Traffic: 1644 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6