Question: Split a multisample bam using RG tag information
gravatar for ttom
5.8 years ago by
ttom210 wrote:

Hello All,

I have bams with multiple samples. I couldnot figure out a way to split them to individual sample bams. Below is the RG tags of my BAM.

According to SM tag, it can be seen that the sample IDs are T9C, T9B, T9A. Can samtools view command do this. Tried with samtools view , but was not successful.

@RG     ID:1    PL:ILLUMINA     PU:D09B5ACXX.8-BG1.G    LB:JV21 DS:capture_id:IS0007,seq_library_id:LID46212,seq_run_id:RD2189A DT:2011-11-21T00:00:00-0500  SM:T9C      CN:Center
@RG     ID:1.1  PL:ILLUMINA     PU:D09B5ACXX.8-BG1.F    LB:JV20 DS:capture_id:IS0007,seq_library_id:LID46212,seq_run_id:RD2189A DT:2011-11-21T00:00:00-0500 SM:T9B      CN:Center
@RG     ID:1.1.1        PL:ILLUMINA     PU:D0BK3ACXX.6-BG1.G    LB:JV21 DS:capture_id:IS0005-0007,seq_library_id:LID46914,seq_run_id:RD2200B    DT:2011-12-02T00:00:00-0500   SM:T9C     CN:Center
@RG     ID:1.1.2        PL:ILLUMINA     PU:D0BK3ACXX.6-BG1.F    LB:JV20 DS:capture_id:IS0005-0007,seq_library_id:LID46914,seq_run_id:RD2200B    DT:2011-12-02T00:00:00-0500   SM:T9B     CN:Center
@RG     ID:1.1.3        PL:ILLUMINA     PU:D0BK3ACXX.6-BG1.E    LB:JV19 DS:capture_id:IS0005-0007,seq_library_id:LID46914,seq_run_id:RD2200B    DT:2011-12-02T00:00:00-0500   SM:T9A     CN:Center
@RG     ID:1.2  PL:ILLUMINA     PU:D09B5ACXX.8-BG1.E    LB:JV19 DS:capture_id:IS0007,seq_library_id:LID46212,seq_run_id:RD2189A DT:2011-11-21T00:00:00-0500 SM:T9A      CN:Center


Help appreciated,



ADD COMMENTlink modified 5.8 years ago by Ashutosh Pandey12k • written 5.8 years ago by ttom210

Can you print first two lines (nom-header) from your bam file?

ADD REPLYlink written 5.8 years ago by Ashutosh Pandey12k

Here are the first two non header lines from the bam

HWI-ST985:72:D0BK3ACXX:6:2208:8968:43903        99      1       9992    15      36S7M2D38M12S   =       10040   140     AACCCTAACCCTAACCCTCTATCCTAACCCTAACCCTCTATCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTACCCTAACCCTAA @EEGFEDEDFFEDEDFFFFC;>C?9@BDEBAC>CGD@D<B@EED1:BF4=7=C=BBCDBDDD;4BE:B<AB9@>;=>+>=>DF?47*=C@?9:   MD:Z:3G3^GA38 ZF:Z:109;352    RG:Z:1.1.1      XG:i:2  AM:i:15 NM:i:3  SM:i:15 XM:i:1  XN:i:9  XO:i:1  ZO:Z:CCATTGGT;@@CFFFFD;ATCTAGCT;CCCFFFFF        MQ:i:15 OQ:Z:FHHHHJJJJJJJJJJJJJJC@?F@9CDHICAH?GHG@G>FBDGE48CF267=C;=@ECAEAA;3@B6;;>A5=;;;=(99?BB935(8A<899    ZR:Z:CCATTGGT;49;BG1.G;ATCTAGCT;49;BG1.G        XT:A:M
D74RYQN1:213:D09B5ACXX:8:1101:12423:58755       163     1       10001   17      27S31M35S       =       10032   123     CCCTAACCCTAAAAGATTCCCGAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCT :(.58ABCCCB(0(%'&&'((%5@#-8=BDACB9;=(49;;@9;:EBDA;>D<A=9BD2A=<<?8A>7>CAD727C@7<=B?:DBCC2;B@@#   MD:Z:31       ZF:Z:109;150    RG:Z:1.1        XG:i:0  AM:i:17 NM:i:0  SM:i:17 XM:i:0  XO:i:0  ZO:Z:TAGATCCT;?<BDFFDD;CAACACCT;BBB7+=+=        MQ:i:17 OQ:Z:D+2AFHJJJJJ+<++++++**1?G):@DHIIIICFH2=CAGGFGGIGHHAHFCED?DE@@AABBAAB?ABCC@59A?<A<AB9AB?A19<?A#    ZR:Z:TAGATCCT;52;BG1.F;CAACACCT;51;BG1.F        XT:A:M


ADD REPLYlink modified 5.8 years ago • written 5.8 years ago by ttom210
gravatar for Ashutosh Pandey
5.8 years ago by
Ashutosh Pandey12k wrote:

Samtools view has an option "-R"  that only output reads in read groups listed in a text file. You can provide a text file containing all the read group IDs for a particular sample of interest. Make sure that the RG IDs should be unique between any two different samples. 

You can create multiple text files, each containing RG IDs of read groups for a sample. Then you can write a loop  that will take each of these files and extract reads using those read ids. 

samtools view -bhR  readids_for_sample_A.txt  File.bam > File_A.bam 

ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by Ashutosh Pandey12k

Yes, it is working.

Thank you for the quick response

ADD REPLYlink written 5.8 years ago by ttom210

No problem. Glad I could help.  

ADD REPLYlink written 5.8 years ago by Ashutosh Pandey12k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1011 users visited in the last hour