Question: Ion Torrent Mapping
1
gravatar for Ian
7.2 years ago by
Ian5.5k
University of Manchester, UK
Ian5.5k wrote:

I am new to Ion Torrent mapping, but have come to the conclusion that TMAP is the mapper of choice at the moment. Would anyone disagree with this statement?.

I have been looking at my Ion Torrent reads with FASTQC and have noticed an odd nucleotide distribution to the first nine bases. It almost looks like primer/linker, but is different for each sample. Has anyone else experienced this? Should the first N bases be removed from Ion Torrent reads?

UPDATE: A suggestion was made to use the --nogroup flag to avoid grouping together values of individual positions when reads are >50bp. However, this did not change the "odd" profile i see. I have now included a snapshot (truncated by me at 54bp).

enter image description here

ion-torrent mapping • 6.7k views
ADD COMMENTlink modified 5.4 years ago by Biostar ♦♦ 20 • written 7.2 years ago by Ian5.5k

Just to double check, try running FASTQC with the --nogroup and see if the problem is still in the first 9 bases. By default it shows the first 9 positions ungrouped, and the remainder just get an average in nucleotide content.

ADD REPLYlink written 7.2 years ago by John St. John1.1k

Thanks John, i tried your suggestion but the same odd distribution is now seen. I will edit my question to include a snapshot.

ADD REPLYlink written 7.2 years ago by Ian5.5k

wow, when you say it looks odd you really mean it.

ADD REPLYlink written 7.2 years ago by John St. John1.1k

did you not get a mapping file with barcodes and primers with your data? what files did they provide you with?

ADD REPLYlink written 7.2 years ago by caseyr5470

I would not think too much and just removed first 23 bases. You can not make a mistake with this approach - worst thing that can happen is that you loose small piece of data and well, you can easily live with that :-)

ADD REPLYlink written 6.3 years ago by Biomonika (Noolean)3.1k
4
gravatar for John St. John
7.2 years ago by
John St. John1.1k
San Francisco, CA, Cancer Therapeutics Innovation Group
John St. John1.1k wrote:

Given your nucleotide distribution, I do not see how the beginnings of these reads could be genomic. Perhaps your samples were multiplexed, and that is the barcode you are seeing? That would explain why the sequences are different in your different samples. At the very least I highly doubt that sequence is genomic, unless the reads all start at a very specific N-mer in the genome that is different for each sample (that seems like a very improbable explanation). Although I have never worked with ion torrent data before, I would definitely recommend getting rid of that part of those reads. It is just too weird.

Even the first 22 or 23 bases look fishy in terms of biases away from certain nucleotide calls. Quite a few programs out there work under the assumption that the beginnings of the reads are of the highest quality. Perhaps the Ion torrent software is built knowing that these kind of oddities can happen? I would probably just strip off the first 23 bases, and then use a read mapper that can handle indels like bowtie2 (not bowtie) or bwa. I might lean toward bowtie2 or bwa bwasw (rather than bwa aln followed by bwa sampe) for these since they are on the longer side. I really don't know anything about TMAP, is there literature stating that it is better for ion torrent reads than something like bwa or bowtie2, and showing a performance comparison?

Also what do you want to do with this data? If you are doing variant calling, then you want a really clean dataset, so err on the side of caution. Having strong position specific read biases like this can bias your variant calls, which is always embarrassing if you think you found something exciting when it is just data noise. After mapping your reads, I would feed the alignment through a pipeline like the raw data processing step that comes before the UnifiedGenotyper in Broad's variant calling pipeline (In the Genome Analysis Toolkit). This alignment processing pipeline has stages that attempt to identify these kind of position specific biases in reads, and then re-adjusts quality scores accordingly.

Anyways, good luck with this dataset!

ADD COMMENTlink modified 7.2 years ago • written 7.2 years ago by John St. John1.1k
4
gravatar for Nick Loman
7.2 years ago by
Nick Loman610
United Kingdom
Nick Loman610 wrote:

Almost certainly these are barcode sequences, have a chat with whoever ran the instrument for you to confirm.

ADD COMMENTlink written 7.2 years ago by Nick Loman610
2
gravatar for pablo.riesgo
6.3 years ago by
pablo.riesgo140
pablo.riesgo140 wrote:

Sorry about getting back to this discussion, but seeing the per base sequence content diagram I interpret that the first 13 bases are almost exactly the same for every read. For every base position in this first 13 a single base gets to almost 100% occurrence. Not specific of Ion Torrent, I have already seen this before as primers with 454 data. I would say it is the primers that we are seeing, removing them would be the solution.

Cheers, Pablo.

ADD COMMENTlink written 6.3 years ago by pablo.riesgo140

Hi pablo.riesgo,

I have the same issue as you.

I don't know what to do.

Bernardo

ADD REPLYlink written 6.3 years ago by biotech540
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1287 users visited in the last hour