How To Handle Reads Ending With Deletions In Gatk?
Entering edit mode
10.4 years ago
Luca Beltrame ▴ 240


before asking my question, I should point out that I'm working with data that's not my own (publicly available), to learn and establish a proper workflow when real data wlll arrive in the laboratory.

I'm dealing with some exome data[1] from an Ion Torrent 318 chip and I'm trying to run the GATK RealignerTargetCreator on it to perform recalibration later on. The problem is that some reads have a deletion at the end:

read ends with deletion. Cigar: 179S54M1D5M1I9M1D

And thus they're not processable by GATK. How to handle this case? Is the workflow I used (outlined below) to blame for this?

Steps I did:

First, QC: keep reads with a phred score of at least 20 in 80% of the bases (python script modeled over the fastx toolkit).

Then, realignment with bwa bwasw (consider that reads by Ion Torrent can go up to 250 bp):

bwa bwasw -t 8 hg19.fa C30-101.filtered.fastq > C30-101.sam

Followed by conversion to BAM, addition of RG groups, sorting, and indexing (pysamtools).

Then GATK was invoked as

 gatk -T RealignerTargetCreator -R hg19.fa -o input.bam.list -I C30-101_RG.bam

(gatk is a small wrapper that merely hides the java -Xmx -jar ... stuff.)

[1] (registration may be required)

gatk indel analysis alignment ion-torrent • 2.4k views
Entering edit mode

Replace "179S54M1D5M1I9M1D" to "179S54M1D5M1I9M1S" (last D to S). Sorry.


Login before adding your answer.

Traffic: 1853 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6