Question: Split A Bam File Into Smaller Files By Tile Number
gravatar for gaelgarcia05
7.1 years ago by
gaelgarcia05210 wrote:

Hi all,

I would like to split a very big BAM file into smaller files for the purpose of annotating it in parallell. Someone suggested splitting it by tile number, which is a good idea since that guarantees that all the alignments for a given read are contained within the same file.

However, I am stuck as to how to phrase the awk command for this purpose, since the tile number is contained within the READ ID string in the first filed of the alignment, separated from the other information in the string by ":" , while this field is separated from the other fields by "\t" .


Tile number (encrypted) = 1101 (5th field) How could I use awk to get each line put into its new corresponding file based on its tile number?

Thanks, Carmen

tophat samtools • 2.2k views
ADD COMMENTlink modified 7.1 years ago by Pierre Lindenbaum128k • written 7.1 years ago by gaelgarcia05210

I think i may have a perl solution to this, but I don't know the exact way to phrase the output. Can anybody help me out ? :)

I have made a hash of hashes, where all the lines of a file are sorted into a key of the "master" hash depending on the value of their 5th field.

%Tiles has n keys, where each key is a different $Tile_Number.

Each $Tile_Number opens a new hash that contains all lines whose $Tile_Number was the right number of the current key. The value of each of these new keys (lines) is just 1.

$Tiles{Tile_Number}($Line}=1 , where $Tiles{Tile_Number} has many $Line=1 entries.

I want to print each $Tiles{$Tile_Number} hash in a separate file, preferably, creating the file upon the creation of the $Tile_Number key, and printing as each new $Tiles{$Tile_Number}{$Line}=1 is added, to save memory. The best would be to not print the final value (1), but I can do away with this, I guess..

How can I tell perl to open a new file for each key in the "master" hash and print all of its keys?

Thank you, Carmen

ADD REPLYlink written 7.1 years ago by gaelgarcia05210
gravatar for Pierre Lindenbaum
7.1 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum128k wrote:

I just wrote a java program to split a BAM by tile:

it uses the picard library to parse the BAM.


cd src/main/java
javac -cp path/to/picard.jar:path/to.sam.jar com/github/lindenb/jvarkit/tools/splitbytitle/


java  -cp path/to/picard.jar:path/to.sam.jar \ \
I=my.bam O=tmp/TILE__TILE__/jeter.__TILE__.bam CREATE_INDEX=true
ADD COMMENTlink modified 7.1 years ago • written 7.1 years ago by Pierre Lindenbaum128k

WOW, cool! Let me check it out, Pierre!

ADD REPLYlink written 7.1 years ago by gaelgarcia05210
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1594 users visited in the last hour