GATK4 HaplotypeCaller - read is malformed
0
0
Entering edit mode
20 months ago
helen ▴ 60

Hi,

I used RNA-Seq data (a pair of control and treatment sample) as input for GATK4 for variant calling. In the HaplotypeCaller step the engine shut down after a few minutes and an error returned as follows:

A USER ERROR has occurred: Read A00355:100:HJCKMDRXX:1:1154:5367:30765 chr1:43621182-43621257 is malformed: read ends with deletion. Cigar: 58H52M2D2M1D3M1I5M4I2M5I3M2D3M2I1D. Although the SAM spec technically permits such reads, this is often indicative of malformed files.

And here is my code:

gatk --java-options "-Xmx20G -Djava.io.tmpdir=./" HaplotypeCaller -ERC GVCF -R hg38.fa -I Control_recal.bam --dbsnp dbsnp_146.hg38.vcf.gz -O Control_g.vcf

same code for the treatment sample except for the prefix.

Does anyone know how to fix this problem? Thanks

RNA-Seq • 890 views
ADD COMMENT
0
Entering edit mode

Hello helen ,

the problem occurs due to a malformed/strange bam file. How was this created?

fin swimmer

ADD REPLY
0
Entering edit mode

Hi fin,

I used STAR 2-pass for alignment, and this step generates the sam files

Then I used Picard to add read groups, covert sam to bam and sort, and then mark duplicates

Then GATK was to split 'N' trim, base quality recalibration, apply BQSR, and variant calling

I am not sure the malformed bam file was created at which step, though...

ADD REPLY
0
Entering edit mode

Please post on the GATK forum.

ADD REPLY

Login before adding your answer.

Traffic: 2014 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6