Question: NGS preprocessing pipleine for ion torrent data
gravatar for Noushin N
5.1 years ago by
Noushin N590
Baltimore, MD
Noushin N590 wrote:

I have just received targeted deep sequencing fastq and bam files which have been generated on ion torrent platform. I am familiar with the GATK best practices pipeline, and have processed numerous whole exome and genomes. I would like to know if there are any key factors/pipeline differences that I need to be aware of,  given the different platform of the new batch.

An exploratory look at one of my samples indicates very high rate of read duplicates (> 90%). I did expect a high rate of duplicates given the PCR nature of the experiment, but was still surprised by the above figure.

Currently, my pipeline performs the following steps in sequence:

1) indexing the bam file
2) reordering and sorting
3) fixing read mate information
4) de-duplication of reads [I wonder if this step should still be there]
5) generating realignment intervals, and realigning reads around indels
6) base quality score recalibration [I have doubts about this step, as on gatk page ion torrent is not explicitly listed as a supported platform]

The purpose of my analysis is to validate an initial set of mutation calls and look at variant allele frequency distribution.

I apologize if this post is a duplicate of an existing thread that I could not locate.

Thank you!

ADD COMMENTlink written 5.1 years ago by Noushin N590

What sort of targeted sequencing was performed? Whole exome, disease biomarker, cancer panel? Any of these methods is expected to produce large numbers of exact duplicates, so I wouldn't worry too much about that.

You can also remove the "fix read mate" step since torrent sequencing won't be paired-end.

ADD REPLYlink modified 14 months ago by Ram32k • written 5.1 years ago by ciclistadan30

Thank you. The data is generated using ion torrent PGM. Do you know if base quality recalibration is applicable/valid here?

ADD REPLYlink written 5.1 years ago by Noushin N590

it is an old post but I would be curious to know how the analysis was carried out in the end. I have the same doubts regarding the duplicates, I also have torrent data and an AmpliSeq panel has been used and I have seen that by eliminating duplicates the number of variants found drops a lot but I'm not sure that eliminating duplicates is a step to take.

ADD REPLYlink written 12 weeks ago by sarastrafella.ss0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2702 users visited in the last hour