Question

Cannot call haplotypes on bam

0

Entering edit mode

21 months ago

dec986 ▴ 370

I have used STAR to align some whole exome sequencing files:

STAR --readFilesCommand zcat --runThreadN 4 --genomeDir /usr/local/share/hg38 --outFileNamePrefix HGFLY --readFilesIn AGKD01_CKDN220004842-1A_HGFLYDSX3_L1_1.fq.gz AGKD01_CKDN220004842-1A_HGFLYDSX3_L1_2.fq.gz --outSAMtype BAM SortedByCoordinate

but when I run haplotypeCaller in the next step:

~/gatk-4.2.6.0/gatk --java-options "-Xmx4g" HaplotypeCaller --input HGFLYAligned.sortedByCoord.out.bam --reference /usr/local/share/hg38/hg38.fa -O HGFLY.g.vcf.gz -ERC GVCF

I get an error:

A USER ERROR has occurred: Argument emit-ref-confidence has a bad value: Can only be used in single sample mode currently. Use the --sample-name argument to run on a single sample out of a multi-sample BAM file.

but when I try to view the samples samtools view -H HGFLYAligned.sortedByCoord.out.bam | grep RG I get none.

I have found similar errors reported, https://gatk.broadinstitute.org/hc/en-us/community/posts/360075892511-emit-ref-confidence-error but I don't see a solution to my problem there.

It seems that my alignment produced no sample names, and I don't see how STAR can output sample names.

A sample view of my bam file:

A00564:479:HGFLYDSX3:1:2420:11686:10708 163 chr1    10001   255 6S108M36S   =   10191   
170948  TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC
TAACCCTAACCCTAACCCGAAACCCAAAACCCAAAACCCGAAACCCAAAACCCA  FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF
FFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFF:FFFFFFFFFFFFFFFFF,F::FFFF:F:F,FFFFF,,FF,:,,,FFF:F,,F::,:F,F,F,:F,  
NH:i:1  HI:i:1  AS:i:200    nM:i:10
A00564:479:HGFLYDSX3:1:2573:2889:29669  163 chr1    10001   255 1S113M36S   =   10005   
104 CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC
CTAACCCTAACCCAAACCCTAACCCTAACCCTAAACCTAACCCTCACCCTAACC  FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF
FFFFFF::FFFFFFFFFFF,FFFFFFF,FFF,FFFF,,::FFF,FFFFF,,FFFF,FFFFF,:FFF,,F:FF,:F,FFF,,F,FF,FFFFF,,,F,F,,,,F  
NH:i:1  HI:i:1  AS:i:207    nM:i:2
A00564:479:HGFLYDSX3:1:2140:5656:8844   163 chr1    10004   3   106M44S =   10004   110 
CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC
CAGCGCCAGAGCCTAGGCGGGGGCCTACGGCTGGGCCGGAGGGTGG  FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFF:FFFFFFFF:F::FF,FF,FFFF:,,F,::,,,,,,,,:,,,,:,,,,,,,,,,,,,:,,:,,,,,,,,,,,,,,:,,  NH:i:2  
HI:i:1  AS:i:206    nM:i:4

How can I call haplotypes/variants?

gatk STAR • 1.3k views

ADD COMMENT • link updated 20 months ago by Shred ★ 1.4k • written 21 months ago by dec986 ▴ 370

0

Entering edit mode

Please post also a brief subset of reads. Did you try to add a readgroup using AddOrReplaceReadGroups ?

ADD REPLY • link 20 months ago by Shred ★ 1.4k

0

Entering edit mode

@Shred I've placed a small view of the aligned bam file. AddOrReplaceReadGroups didn't change anything, unfortunately

ADD REPLY • link 20 months ago by dec986 ▴ 370

0

Entering edit mode

What do you mean by

AddOrReplaceReadGroups didn't change anything

You're in the situation where your BAM file has no RG information, so the caller doesn't know how your reads are actually stored. An alternative could also be samtools addreplacerg as stated here.

Closing, why are you using STAR, designed for RNAseq, to align exome sequencing data?

ADD REPLY • link 20 months ago by Shred ★ 1.4k

0

Entering edit mode

Is STAR inappropriate for Whole Exome Sequencing?

I'll check out addreplacerg

ADD REPLY • link 20 months ago by dec986 ▴ 370

0

Entering edit mode

Sure. Use bwa instead

ADD REPLY • link 20 months ago by Shred ★ 1.4k