Decipering .BAM file from Ion Torrent reads
1
0
Entering edit mode
5.7 years ago
newbio17 ▴ 360

I have been given .BAM files that were generated a while back (4 years ago) and were tasked with unmapping & remapping the reads.

I attempted my usual approach of extracting paired-end reads from .BAM using bedtools' bamToFastq which generated two empty fastq files. I went ahead and attempted to use other available tools (samtools, Picard). With samtools' bam2fq, I was able to extract all of the reads into a single fastq file. However, I noticed that header information of the reads were missing labels that differentiates forward from reverse reads (e.g. /1, /2).

Below is what one of the lines in .BAM file looks like before I attempt to unmap the reads:

@RG     ID:ZFXSC.IonXpress_008.1        PL:IONTORRENT   PU:PGM/P1.1.17/IonXpress_008    LB:hg19/IonXpress_008   FO:TACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATG
TACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTAC
GTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACG     DT:2014-01-29T15:52:12-0500     SM:978R PG:tmap KS:TCAGTTCCGATAACGAT
    CN:TorrentServer/GTproton

And below is one of the reads from unmapped reads using samtools:

@ZFXSC:08727:06569
CGTTTAAACTACATGCAGGAACAGCGAA
+
::=:5:?5:555:<::990/)//4,,,)

From what I was able to gather, the current unmapped reads contain following information in the header:

Run ID: Every TS analysis gets a run ID, a 5-character string consisting of upper case letters and numbers, assigned. A reanalysis of a specific run will get a different run ID assigned. Example: 0JU8V.

SAM record (read) names: Read names are a combination of the run ID and the chip coordinates of the well that produced the read. The coordinate values are 5-digit numbers and are given in the order row and the column, separated by a colon. Example: 0JU8V:01308:00107.

However, they seem to be missing read group ID, defined as:

Read Group ID: For non-barcoded runs the read group ID is equal to the run ID. For barcoded runs it is a combination of the run ID and the barcode name, separated by a dot. Example: 0JU8V.IonXpress_001

I assumed that I should be able to determine which read belongs to one of the two paired-end read fastq files by looking at the read group ID, but the .BAM file has more than 2 read groups (95 if I counted correctly).

ZFXSC.IonXpress_008.[1,..., 29, 1A,..., 1Z, 2A,..., 2M, A,..., X ].

I attempted using samtools view -c -f 1 on the .BAM file and noticed that it returned 0, which I believe means that the .BAM file was generated from single-end reads.

However, I was told by the person that have given me the .BAM files that he/she believe .BAM files to be generated using paired-end reads. Is there anything else I could do to figure out what exactly is going on here?

next-gen sequencing alignment • 3.1k views
ADD COMMENT
1
Entering edit mode

Most commonly Ion Torrent reads are single-end. What is the output of:

samtools view file.bam | cut -f2 | sort | uniq -c

And:

samtools view -h file.bam | grep "@PG"
ADD REPLY
0
Entering edit mode

Hi h.mon,

For 1st command line prompt:

36932740 0
36837958 16
 968314 4

It looks like the distribution between 0 and 16 bitwise flags are roughly even with some unmapped reads (4). Can I say now that the reads are paired-end based on 0 and 16 (reverse complement)?

For 2nd command line prompt:

@PG     ID:bc   PN:BaseCaller   VN:4.0-6/76303  CL:BaseCaller --keypass-filter on --phasing-residual-filter=2.0 --trim-qual-cutoff 15 --trim-qual-window-size 30 --trim-adapter-cutoff 16 --num-unfiltered 1000 --calibration-file basecaller
_results/recalibration/flowQVtable.txt --phase-estimation-file basecaller_results/recalibration/BaseCaller.json --model-file basecaller_results/recalibration/hpModel.txt --input-dir=sigproc_results --librarykey=TCAG --tfkey=ATCG --run-id
=ZFXSC --output-dir=basecaller_results --block-col-offset 9016 --block-row-offset 9324 --datasets=basecaller_results/datasets_pipeline.json --trim-adapter ATCACCGACTGCCCATAGAGAGGCTGAGAC --barcode-filter 0.01 --barcode-filter-minreads 0

I didn't see any information regarding the aligner and the command used to perform the alignment, but I believe .BAM file was generated using tmap.

Thank you.

ADD REPLY
3
Entering edit mode
5.7 years ago
h.mon 35k

Your reads are unpaired, or, if they are paired, they have been mapped as unpaired. The 0 and 16 flags means mapped and mapped, reverse strand - if this were properly mapped, paired data, you would expect a lot of 99 and 147.

I thought the @PG field could help solve the question, but I am not familiar with Ion Torrent tools, and don't see anything there useful.

If you are certain these are paired reads, sort the bam file by name, and then inspect it with samtools view.

samtools sort -n -o file.sorted.bam file.bam
samtools view file.sorted.bam | head -n 20
ADD COMMENT

Login before adding your answer.

Traffic: 1997 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6