gatk CombineGVCFs output contains only one column of genotypes
0
0
Entering edit mode
2.4 years ago

I'm trying to get a workflow up and running for sequencing a small 500 bp region on chromosome 5. I've got the initial steps of alignment and qc, but according to gatk best practices I need to combine my gvcf files before genotyping, normalising, indexing and recalibrating variants. the step I'm really stuck on is combining the gvcf files into one. The output I get contains only one column of genotypes, but I'm expecting a column for each sample.

Here's the combining gvcf step:

# List paths to GVCFs to combine
# https://www.biostars.org/p/9501554/#9501607
find "$DIRECTORY"/temp_gvcf_2 -type f -name "*.vcf.gz" > "$DIRECTORY"/temp_gvcf_2/input.list

# Combine GVCFs
gatk CombineGVCFs \
   --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true' \
   -R "$GKREF"/Homo_sapiens_assembly38.fasta \
   --variant "$DIRECTORY"/temp_gvcf_2/input.list \
   -O "$DIRECTORY"/temp_gvcf_2/cohort.g.vcf

And here are the columns I get in the resulting cohort.g.vcf file:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  M00819_262_000000000-BP6CG_1_
chr1    12046   .   G   <NON_REF>   .   .   END=12410   GT:DP:GQ:MIN_DP:PL  ./.:0:0:0:0,0,0
chr1    12496   .   G   <NON_REF>   .   .   END=12860   GT:DP:GQ:MIN_DP:PL  ./.:0:0:0:0,0,0
chr1    13316   .   A   <NON_REF>   .   .   END=13767   GT:DP:GQ:MIN_DP:PL  ./.:0:0:0:0,0,0

Anyone know where I'm going wrong??

combingvcfs gatk • 1.5k views
ADD COMMENT
0
Entering edit mode

what is the output of wc -l "$DIRECTORY"/temp_gvcf_2/input.list?

(BTW sounds weird for me , I would write wc -l "${DIRECTORY}/temp_gvcf_2/input.list"

ADD REPLY
0
Entering edit mode

The result of your command is:

(bioinfo) skgtmdf@rds-gw-008:/mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13$ wc -l "${DIRECTORY}/temp_gvcf_2/input.list"
3 /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/MF_MSH3_MiSeq_2018.03.23/temp_gvcf_2/input.list

And the contents of the file are:

/mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/MF_MSH3_MiSeq_2018.03.23/temp_gvcf_2/152-270-051_S67_L001_g.vcf.gz
/mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/MF_MSH3_MiSeq_2018.03.23/temp_gvcf_2/135-074-011_S114_L001_g.vcf.gz
/mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/MF_MSH3_MiSeq_2018.03.23/temp_gvcf_2/045-416-487_S119_L001_g.vcf.gz
ADD REPLY
0
Entering edit mode

and what is the output of

cat "${DIRECTORY}/temp_gvcf_2/input.list" | while read V; do bcftools query -l "${V}" ; done | sort | uniq

?

ADD REPLY
0
Entering edit mode

The result of this command is this:

(bioinfo) skgtmdf@rds-gw-008:/mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13$ cat "${DIRECTORY}/temp_gvcf_2/input.list" | while read V; do bcftools query -l "${V}" ; done | sort | uniq
bcftools: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory
bcftools: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory
bcftools: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory

I take it this could be the problem? Not sure what the issue is though?

ADD REPLY
0
Entering edit mode

your bcftools is not correctly installed. But it is another problem. You can try:

cat "${DIRECTORY}/temp_gvcf_2/input.list" | while read V; do gunzip -c "${V}" | grep -m1 "^#CHROM" | cut -f10- | tr "\t" "\n" ; done | sort | uniq
ADD REPLY

Login before adding your answer.

Traffic: 2604 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6