Entering edit mode
10 months ago
dec986 ▴ 360
I have used STAR to align some whole exome sequencing files:
STAR --readFilesCommand zcat --runThreadN 4 --genomeDir /usr/local/share/hg38 --outFileNamePrefix HGFLY --readFilesIn AGKD01_CKDN220004842-1A_HGFLYDSX3_L1_1.fq.gz AGKD01_CKDN220004842-1A_HGFLYDSX3_L1_2.fq.gz --outSAMtype BAM SortedByCoordinate
but when I run haplotypeCaller in the next step:
~/gatk-22.214.171.124/gatk --java-options "-Xmx4g" HaplotypeCaller --input HGFLYAligned.sortedByCoord.out.bam --reference /usr/local/share/hg38/hg38.fa -O HGFLY.g.vcf.gz -ERC GVCF
I get an error:
A USER ERROR has occurred: Argument emit-ref-confidence has a bad value: Can only be used in single sample mode currently. Use the --sample-name argument to run on a single sample out of a multi-sample BAM file.
but when I try to view the samples
samtools view -H HGFLYAligned.sortedByCoord.out.bam | grep RG I get none.
I have found similar errors reported, https://gatk.broadinstitute.org/hc/en-us/community/posts/360075892511-emit-ref-confidence-error but I don't see a solution to my problem there.
It seems that my alignment produced no sample names, and I don't see how STAR can output sample names.
A sample view of my bam file:
A00564:479:HGFLYDSX3:1:2420:11686:10708 163 chr1 10001 255 6S108M36S = 10191 170948 TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC TAACCCTAACCCTAACCCGAAACCCAAAACCCAAAACCCGAAACCCAAAACCCA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF FFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFF:FFFFFFFFFFFFFFFFF,F::FFFF:F:F,FFFFF,,FF,:,,,FFF:F,,F::,:F,F,F,:F, NH:i:1 HI:i:1 AS:i:200 nM:i:10 A00564:479:HGFLYDSX3:1:2573:2889:29669 163 chr1 10001 255 1S113M36S = 10005 104 CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC CTAACCCTAACCCAAACCCTAACCCTAACCCTAAACCTAACCCTCACCCTAACC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF FFFFFF::FFFFFFFFFFF,FFFFFFF,FFF,FFFF,,::FFF,FFFFF,,FFFF,FFFFF,:FFF,,F:FF,:F,FFF,,F,FF,FFFFF,,,F,F,,,,F NH:i:1 HI:i:1 AS:i:207 nM:i:2 A00564:479:HGFLYDSX3:1:2140:5656:8844 163 chr1 10004 3 106M44S = 10004 110 CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC CAGCGCCAGAGCCTAGGCGGGGGCCTACGGCTGGGCCGGAGGGTGG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF:FFFFFFFF:F::FF,FF,FFFF:,,F,::,,,,,,,,:,,,,:,,,,,,,,,,,,,:,,:,,,,,,,,,,,,,,:,, NH:i:2 HI:i:1 AS:i:206 nM:i:4
How can I call haplotypes/variants?
Please post also a brief subset of reads. Did you try to add a readgroup using AddOrReplaceReadGroups ?
@Shred I've placed a small view of the aligned bam file. AddOrReplaceReadGroups didn't change anything, unfortunately
What do you mean by
You're in the situation where your BAM file has no RG information, so the caller doesn't know how your reads are actually stored. An alternative could also be
samtools addreplacergas stated here.
Closing, why are you using STAR, designed for RNAseq, to align exome sequencing data?
Is STAR inappropriate for Whole Exome Sequencing?
I'll check out
Sure. Use bwa instead