Question: Split A Bam File Into Smaller Files By Tile Number
0
gravatar for gaelgarcia05
6.2 years ago by
gaelgarcia05190
UK
gaelgarcia05190 wrote:

Hi all,

I would like to split a very big BAM file into smaller files for the purpose of annotating it in parallell. Someone suggested splitting it by tile number, which is a good idea since that guarantees that all the alignments for a given read are contained within the same file.

However, I am stuck as to how to phrase the awk command for this purpose, since the tile number is contained within the READ ID string in the first filed of the alignment, separated from the other information in the string by ":" , while this field is separated from the other fields by "\t" .

HWI-ST975:104:C0W47ACXX:8:1101:8269:91631

Tile number (encrypted) = 1101 (5th field) How could I use awk to get each line put into its new corresponding file based on its tile number?

Thanks, Carmen

tophat samtools • 2.0k views
ADD COMMENTlink modified 6.2 years ago by Pierre Lindenbaum121k • written 6.2 years ago by gaelgarcia05190

I think i may have a perl solution to this, but I don't know the exact way to phrase the output. Can anybody help me out ? :)

I have made a hash of hashes, where all the lines of a file are sorted into a key of the "master" hash depending on the value of their 5th field.

%Tiles has n keys, where each key is a different $Tile_Number.

Each $Tile_Number opens a new hash that contains all lines whose $Tile_Number was the right number of the current key. The value of each of these new keys (lines) is just 1.

$Tiles{Tile_Number}($Line}=1 , where $Tiles{Tile_Number} has many $Line=1 entries.

I want to print each $Tiles{$Tile_Number} hash in a separate file, preferably, creating the file upon the creation of the $Tile_Number key, and printing as each new $Tiles{$Tile_Number}{$Line}=1 is added, to save memory. The best would be to not print the final value (1), but I can do away with this, I guess..

How can I tell perl to open a new file for each key in the "master" hash and print all of its keys?

Thank you, Carmen

ADD REPLYlink written 6.2 years ago by gaelgarcia05190
1
gravatar for Pierre Lindenbaum
6.2 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum121k wrote:

I just wrote a java program to split a BAM by tile:

https://github.com/lindenb/jvarkit/blob/master/src/main/java/com/github/lindenb/jvarkit/tools/splitbytitle/SplitByTile.java

it uses the picard library to parse the BAM.

Compilation:

cd src/main/java
javac -cp path/to/picard.jar:path/to.sam.jar com/github/lindenb/jvarkit/tools/splitbytitle/SplitByTile.java

Execute

java  -cp path/to/picard.jar:path/to.sam.jar \
com.github.lindenb.jvarkit.tools.splitbytitle.SplitByTile \
I=my.bam O=tmp/TILE__TILE__/jeter.__TILE__.bam CREATE_INDEX=true
ADD COMMENTlink modified 6.2 years ago • written 6.2 years ago by Pierre Lindenbaum121k

WOW, cool! Let me check it out, Pierre!

ADD REPLYlink written 6.2 years ago by gaelgarcia05190
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 933 users visited in the last hour