before asking my question, I should point out that I'm working with data that's not my own (publicly available), to learn and establish a proper workflow when real data wlll arrive in the laboratory.
I'm dealing with some exome data from an Ion Torrent 318 chip and I'm trying to run the GATK RealignerTargetCreator on it to perform recalibration later on. The problem is that some reads have a deletion at the end:
read ends with deletion. Cigar: 179S54M1D5M1I9M1D
And thus they're not processable by GATK. How to handle this case? Is the workflow I used (outlined below) to blame for this?
Steps I did:
First, QC: keep reads with a phred score of at least 20 in 80% of the bases (python script modeled over the fastx toolkit).
Then, realignment with bwa bwasw (consider that reads by Ion Torrent can go up to 250 bp):
bwa bwasw -t 8 hg19.fa C30-101.filtered.fastq > C30-101.sam
Followed by conversion to BAM, addition of RG groups, sorting, and indexing (pysamtools).
Then GATK was invoked as
gatk -T RealignerTargetCreator -R hg19.fa -o input.bam.list -I C30-101_RG.bam
gatk is a small wrapper that merely hides the
java -Xmx -jar ... stuff.)
 http://lifetech-it.hosted.jivesoftware.com/docs/DOC-2659 (registration may be required)