Entering edit mode
11 months ago
avelarbio46
▴
30
Hello everyone! I have a multisample VCF, anottated with VEP I wanted to know how many samples had each type of genotype. For that, I used:
paste <(bcftools view "$MYVCF" |\
awk -F"\t" 'BEGIN {print "#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO"} \
!/^#/ {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8}') \
\
<(bcftools query -f '[\t%SAMPLE=%GT]\n' "$MYVCF" |\
awk 'BEGIN {print "nHet"} {print gsub(/0\|1|1\|0|0\/1|1\/0/, "")}') \
\
<(bcftools query -f '[\t%SAMPLE=%GT]\n' "$MYVCF" |\
awk 'BEGIN {print "nHomAlt"} {print gsub(/1\|1|1\/1/, "")}') \
\
<(bcftools query -f '[\t%SAMPLE=%GT]\n' "$MYVCF" |\
awk 'BEGIN {print "nHomRef"} {print gsub(/0\|0|0\/0/, "")}') \
\
<(bcftools view "$MYVCF" | awk -F"\t" '/^#CHROM/ {split($0, header, "\t"); print "HetSamples"} \
!/^#CHROM/ {for (i=10; i<=NF; i++) {if (gsub(/0\|1|1\|0|0\/1|1\/0/, "", $(i))==1) {printf header[i]","}; if (i==NF) {printf "\n"}}}') \
\
<(bcftools view "$MYVCF" | awk -F"\t" '/^#CHROM/ {split($0, header, "\t"); print "HomSamplesAlt"} \
!/^#CHROM/ {for (i=10; i<=NF; i++) {if (gsub(/1\|1|1\/1/, "", $(i))==1) {printf header[i]","}; if (i==NF) {printf "\n"}}}') \
\
<(bcftools view "$MYVCF" | awk -F"\t" '/^#CHROM/ {split($0, header, "\t"); print "HomSamplesRef"} \
!/^#CHROM/ {for (i=10; i<=NF; i++) {if (gsub(/0\|0|0\/0/,"", $(i))==1) {printf header[i]","}; if (i==NF) {printf "\n"}}}') \
\
| sed 's/,\t/\t/g' | sed 's/,$//g' >> output.vcf
This makes a tab file like:
#CHROM POS ID REF ALT QUAL FILTER INFO nHet nHomAlt nHomRef HetSamples HomSamplesAlt HomSamplesRef
But then, I need to expand the CSQ field. I was using, for the annotated vcf, this snippet:
echo -e "CHROM\tPOS\tREF\tALT\t$(bcftools +split-vep -l "$MYVCF" f | cut -f 2 | tr '\n' '\t' | sed 's/\t$//')" >output.tsv
bcftools +split-vep -f '%CHROM\t%POS\t%REF\t%ALT\t%CSQ\n' -d -A tab $MYVCF" >> output.tsv
But then, using the first output.vcf (with genotyping counts) and trying to expand to a tab format does not work.
I was thinking that maybe the easiest way would be to include in the original vep annotated vcf the genotype information I just created. Is it possible?