Converting ION torrent .BAM to Fastq
3
2
Entering edit mode
6.2 years ago
alok.helix ▴ 80

Hello Colleagues,

I have a set of files generated by ion torrent server. It seems that the server produces a fastq file in the BAM format. I am quite surprised because I have been using an Illumina dataset, which produces a fastq file. The BAM of PGM looks quite different in comparison to the Illumina bam file(samtools view -H ion.bam), Any suggestions how to generate a Fastq from this bam.

Thank you

sequence Assembly genome Ion torrent PGM • 9.2k views
ADD COMMENT
0
Entering edit mode

Dear Ashutosh, please see the above illustrated file!! It looks notthings like a BAM to me....Where is the CIGAR and all the relevant fields??

ADD REPLY
0
Entering edit mode

Can you please post a few example lines of the BAM data?

ADD REPLY
0
Entering edit mode

@HD    VN:1.4    SO:coordinate
@RG    ID:1F9VX.IonXpress_010    PL:IONTORRENT    PU:Unspecified/314R/IonXpress_010    FO:TACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGA    DS:Running 1 DMD 1 Lung and Colon 1 E coli and 1 BCRABL Sample    DT:2015-04-16T19:57:17+0530    SM:E_Coli MDR    KS:TCAGCTGACCGAACGAT    CN:TorrentServer/sn11c061316
@PG    ID:bc    PN:BaseCaller    VN:4.2-18/6b3fd1b    CL:BaseCaller --barcode-filter 0.01 --barcode-filter-minreads 20 --calibration-file basecaller_results/recalibration/hpTable.txt --phase-estimation-file basecaller_results/recalibration/BaseCaller.json --model-file basecaller_results/recalibration/hpModel.txt --input-dir=sigproc_results --librarykey=TCAG --tfkey=ATCG --run-id=1F9VX --output-dir=basecaller_results --block-col-offset 0 --block-row-offset 0 --datasets=basecaller_results/datasets_pipeline.json --trim-adapter ATCACCGACTGCCCATAGAGAGGCTGAGAC
@CO    {"1F9VX.block_X0_Y0":{"flowEnd":499,"flowSpan":250,"flowStart":0,"max_hp_calibrated":12,"modelParameters":[{"flowBase":65,"flowEnd":249,"flowStart":0,"paramA":1.0,"paramB":0.0,"refHP":0,"xMax":639,"xMin":0,"yMax":575,"yMin":0},{"flowBase":65,"flowEnd":249,"flowStart":0,"paramA":0.9923999905586243,"paramB":0.0,"refHP":1,"xMax":639,"xMin":0,"yMax":575,"yMin":0},{"flowBase":65,"flowEnd":249,"flowStart":0,"paramA":0.7135999798774719,"paramB":0.4217000007629395,"refHP":2,"xMax":639,"xMin":0,"yMax":575,"yMin":0},{"flowBase":65,"flowEnd":249,"flowStart":0,"paramA":0.8190000057220459,"paramB":0.3619999885559082,"refHP":3,"xMax":639,"xMin":0,"yMax":575,"yMin":0},{"flowBase":65,"flowEnd":249,"flowStart":0,"paramA":1.236700057983398,"paramB":-1.080600023269653,"refHP":4,"xMax":639,"xMin":0,"yMax":575,"yMin":0},{"flowBase":65,"flowEnd":249,"flowStart":0,"paramA":1.0,"paramB":0.0,"refHP":5,"xMax":639,"xMin":0,"yMax":575,"yMin":0},{"flowBase":65,"flowEnd":249,"flowStart":0,"paramA":1.0,"paramB":0.0,"refHP":6,"xMax":639,"xMin":0,"yMax":575,"yMin":0},{"flowBase":65,"flowEnd":249,"flowStart":0,"paramA":1.0,"paramB":0.0,"refHP":7,"xMax":639,"xMin":0,"yMax":575,"yMin":0},{"flowBase":65,"flowEnd":249,"flowStart":0,"paramA":1.0,"paramB":0.0,"refHP":8,"xMax":639,"xMin":0,"yMax":575,"yMin":0},{"flowBase":65,"flowEnd":249,"flowStart":0,"paramA":1.0,"paramB":0.0,"refHP":9,"xMax":639,"xMin":0,"yMax":575,"yMin":0},{"flowBase":65,"flowEnd":249,"flowStart":0,"paramA":1.0,"paramB":0.0,"refHP":10,"xMax":639,"xMin":0,"yMax":575,"yMin":0},{"flowBase":65,"flowEnd":249,"flowStart":0,"paramA":1.0,"paramB":0.0,"refHP":11,"xMax":639,"xMin":0,"yMax":575,"yMin":0},{"flowBase":67,"flowEnd":249,"flowStart":0,"paramA":1.0,"paramB":0.0,"refHP":0,"xMax":639,"xMin":0,"yMax":575,"yMin":0},{"flowBase":67,"flowEnd":249,"flowStart":0,"paramA":0.9915999770164490,"paramB":0.0,"refHP":1,"xMax":639,"xMin":0,"yMax":575,"yMin":0},{"flowBase":67,"flowEnd":249,"flowStart":0,"paramA":0.9466999769210815,"paramB":0.0,"refHP":2,"xMax":639,"xMin":0,"yMax":575,"yMin":0},{"flowBase":67,"flowEnd":249,"flowStart":0,"paramA":0.7113000154495239,"paramB":0.6338999867439270,"refHP":3,"xMax":639,"xMin":0,"yMax":575,"yMin":0},{"flowBase":67,"flowEnd":249,"flowStart":0,"paramA":1.0,"paramB":0.0,"refHP":4,"xMax":639,"xMin":0,"yMax":575,"yMin":0},{"flowBase":67,"flowEnd":249,"flowStart":0,"paramA":1.0,"paramB":0.0,"refHP":5,"xMax":639,"xMin":0,"yMax":575,"yMin":0},{"flowBase":67,"flowEnd":249,"flowStart":0,"paramA":1.0,"paramB":0.0,"refHP":6,"xMax":639,"xMin":0,"yMax":575,"yMin":0},{"flowBase":67,"flowEnd":249,"flowStart":0,"paramA":1.0,"paramB":0.0,"refHP":7,"xMax":639,"xMin":0,"yMax":575,"yMin":0},{"flowBase":67,"flowEnd":249,"flowStart":0,"paramA":1.0,"paramB":0.0,"refHP":8,"xMax":639,"xMin":0,"yMax":575,"yMin":0},{"flowBase":67,"flowEnd":249,"flowStart":0,"paramA":1.0,"paramB":0.0,"refHP":9,"xMax":639,"xMin":0,"yMax":575,"yMin":0},{"flowBase":67,"flowEnd":249,"flowStart":0,"paramA":1.0,"paramB":0.0,"refHP":10,"xMax":639,"xMin":0,"yMax":575,"yMin":0},{"flowBase":67,"flowEnd":249,"flowStart":0,"paramA":1.0,"paramB":0.0,"refHP":11,"xMax":639,"xMin":0,"yMax":575,"yMin":0},{"flowBase":71,"flowEnd":249,"flowStart":0,"paramA":1.0,"paramB":0.0,"refHP":0,"xMax":639,"xMin":0,"yMax":575,"yMin":0},{"flowBase":71,"flowEnd":249,"flowStart":0,"paramA":0.9603000283241272,"paramB":0.0,"refHP":1,"xMax":639,"xMin":0,"yMax":575,"yMin":0},{"flowBase":71,"flowEnd":249,"flowStar

It goes on & on like this!!! & ends like below

.0,"paramB":0.0,"refHP":11,"xMax":1279,"xMin":640,"yMax":1151,"yMin":576}],"xMax":1279,"xMin":0,"xSpan":640,"yMax":1151,"yMin":0,"ySpan":576},"MagicCode":"6d5b9d29ede5f176a4711d415d769108","MasterCol":0,"MasterKey":"1F9VX.block_X0_Y0","MasterRow":0}
@CO    {"BeadAdapters":{"Adapter_0":{"adapter_sequence":"ATCACCGACTGCCCATAGAGAGGCTGAGAC"}}}

It seems as per

http://mendel.iontorrent.com/ion-docs/Technical-Note---Transition-from-SFF-to-BAM-format_37421247.html there is no more Fastq!!!

ADD REPLY
0
Entering edit mode

Dear alok.helix,

i'm also a user of IonTorrent and i'm quite sure you CAN download fastq from inside IonReporter, if i remember well...

ADD REPLY
0
Entering edit mode

It is phased out!! :(

ADD REPLY
4
Entering edit mode
6.2 years ago
Kowen ▴ 40

Because Ion torrent uses bam files (unaligned/aligned bam) to store the sequencing machine, flow signal, base caller and aligner information. Especially, flow signal information is very important parameter in Ion torrent variant calling workflow. However, the fastq format could not store such information. If you will call variants using ion torrent data, you are better to the bam files that directly output from torrent suite and use the Torrent Variant Caller. If you will assemble the MDR E.coli genome, you could follow Pgibas's suggestion to convert bam to fastq by the bamtools.

ADD COMMENT
3
Entering edit mode
6.1 years ago
biocyberman ▴ 830

If you can use bamToFastq like others have suggested, it is fine. Alternatively you can install FileExporter plugin for TorrentSuite and let it export FASTQ file for you automatically whenever a run is done on Iontorrent. At anytime you can run the FileExporter plugin on an existing analysis. Check this out: http://mendel.iontorrent.com/ion-docs/FileExporter-Plugin.html 

ADD COMMENT
0
Entering edit mode

I have a doubt. Whatever we get fastq files from torrent suite, I want to align them. Which alignment program should I use? BWA or bowtie2? Please let me know.

ADD REPLY
2
Entering edit mode
6.2 years ago
PoGibas 4.9k

Sometimes I work with Ion data too. This is example of my original "bam" files generated from the BaseCaller: 

@HD     VN:1.4  GO:none SO:coordinate
@RG     ID:GJPTV.IonXpress_Name  PL:IONTORRENT   PU:Unspecified/P1.1.17/IonXpress_001    FO:TACGT...........TGAGCA      DT:2014-01-22T21:36:11+0200     SM:seq_tissue_1 KS:TCAGCTAAGGTAACGAT     CN:TorrentServer/Proton1
...
@PG     ID:bc.Z PN:BaseCaller   VN:4.2-18/6b3fd1b       CL:BaseCaller --barcode-filter 0.01 --barcode-filter-minreads 10 --keypass-filter on --phasing-residual-filter=2.0 --num-unfiltered 1000 --barcode-filter-postpone 1 --input-dir=sigproc_results --librarykey=TCAG --tfkey=ATCG --run-id=GJPTV --output-dir=basecaller_results --block-col-offset 3864 --block-row-offset 6660 --datasets=basecaller_results/datasets_pipeline.json --trim-adapter ATCACCGACTGCCCATAGAGAGGCTGAGAC
...
@CO     {"BeadAdapters":{"Adapter_0":{"adapter_sequence":"ATCACCGACTGCCCATAGAGAGGCTGAGAC"}}}
...

And this is all get (~200 lines; using samtools view -H), nonetheless their size is ~10Gb. I convert them to usual fastq using bamToFastq from bedtools.

ADD COMMENT
0
Entering edit mode

Thanks a lot i eventually realized this through number of trials, my reads were not missing and the data was fine. I too converted them to fastq and now am further interested to annotate my data, is there a necessity to prepare contigs.fa of your reads if you have aligned your data against a ref genome and got the bam of ampped reads?

ADD REPLY

Login before adding your answer.

Traffic: 1530 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6