Split a multisample bam using RG tag information
1
2
Entering edit mode
9.7 years ago
ttom ▴ 220

Hello All,

I have bams with multiple samples. I couldnot figure out a way to split them to individual sample bams. Below is the RG tags of my BAM.

According to SM tag, it can be seen that the sample IDs are T9C, T9B, T9A. Can samtools view command do this. Tried with samtools view , but was not successful.

@RG     ID:1    PL:ILLUMINA     PU:D09B5ACXX.8-BG1.G    LB:JV21 DS:capture_id:IS0007,seq_library_id:LID46212,seq_run_id:RD2189A DT:2011-11-21T00:00:00-0500  SM:T9C      CN:Center
@RG     ID:1.1  PL:ILLUMINA     PU:D09B5ACXX.8-BG1.F    LB:JV20 DS:capture_id:IS0007,seq_library_id:LID46212,seq_run_id:RD2189A DT:2011-11-21T00:00:00-0500  SM:T9B      CN:Center
@RG     ID:1.1.1        PL:ILLUMINA     PU:D0BK3ACXX.6-BG1.G    LB:JV21 DS:capture_id:IS0005-0007,seq_library_id:LID46914,seq_run_id:RD2200B    DT:2011-12-02T00:00:00-0500   SM:T9C     CN:Center
@RG     ID:1.1.2        PL:ILLUMINA     PU:D0BK3ACXX.6-BG1.F    LB:JV20 DS:capture_id:IS0005-0007,seq_library_id:LID46914,seq_run_id:RD2200B    DT:2011-12-02T00:00:00-0500   SM:T9B     CN:Center
@RG     ID:1.1.3        PL:ILLUMINA     PU:D0BK3ACXX.6-BG1.E    LB:JV19 DS:capture_id:IS0005-0007,seq_library_id:LID46914,seq_run_id:RD2200B    DT:2011-12-02T00:00:00-0500   SM:T9A     CN:Center
@RG     ID:1.2  PL:ILLUMINA     PU:D09B5ACXX.8-BG1.E    LB:JV19 DS:capture_id:IS0007,seq_library_id:LID46212,seq_run_id:RD2189A DT:2011-11-21T00:00:00-0500  SM:T9A      CN:Center

Help appreciated,

Thanks,
Tinu

samtools bam • 9.8k views
ADD COMMENT
0
Entering edit mode

Can you print first two lines (nom-header) from your bam file?

ADD REPLY
0
Entering edit mode

Here are the first two non header lines from the bam

HWI-ST985:72:D0BK3ACXX:6:2208:8968:43903        99      1       9992    15      36S7M2D38M12S   =       10040   140     AACCCTAACCCTAACCCTCTATCCTAACCCTAACCCTCTATCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTACCCTAACCCTAA @EEGFEDEDFFEDEDFFFFC;>C?9@BDEBAC>CGD@D<B@EED1:BF4=7=C=BBCDBDDD;4BE:B<AB9@>;=>+>=>DF?47*=C@?9:   MD:Z:3G3^GA38 ZF:Z:109;352    RG:Z:1.1.1      XG:i:2  AM:i:15 NM:i:3  SM:i:15 XM:i:1  XN:i:9  XO:i:1  ZO:Z:CCATTGGT;@@CFFFFD;ATCTAGCT;CCCFFFFF        MQ:i:15 OQ:Z:FHHHHJJJJJJJJJJJJJJC@?F@9CDHICAH?GHG@G>FBDGE48CF267=C;=@ECAEAA;3@B6;;>A5=;;;=(99?BB935(8A<899    ZR:Z:CCATTGGT;49;BG1.G;ATCTAGCT;49;BG1.G        XT:A:M
D74RYQN1:213:D09B5ACXX:8:1101:12423:58755       163     1       10001   17      27S31M35S       =       10032   123     CCCTAACCCTAAAAGATTCCCGAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCT :(.58ABCCCB(0(%'&&'((%5@#-8=BDACB9;=(49;;@9;:EBDA;>D<A=9BD2A=<<?8A>7>CAD727C@7<=B?:DBCC2;B@@#   MD:Z:31       ZF:Z:109;150    RG:Z:1.1        XG:i:0  AM:i:17 NM:i:0  SM:i:17 XM:i:0  XO:i:0  ZO:Z:TAGATCCT;?<BDFFDD;CAACACCT;BBB7+=+=        MQ:i:17 OQ:Z:D+2AFHJJJJJ+<++++++**1?G):@DHIIIICFH2=CAGGFGGIGHHAHFCED?DE@@AABBAAB?ABCC@59A?<A<AB9AB?A19<?A#    ZR:Z:TAGATCCT;52;BG1.F;CAACACCT;51;BG1.F        XT:A:M
ADD REPLY
11
Entering edit mode
9.7 years ago

Samtools view has an option -R that only output reads in read groups listed in a text file. You can provide a text file containing all the read group IDs for a particular sample of interest. Make sure that the RG IDs should be unique between any two different samples.

You can create multiple text files, each containing RG IDs of read groups for a sample. Then you can write a loop that will take each of these files and extract reads using those read ids.

samtools view -bhR readids_for_sample_A.txt File.bam > File_A.bam
ADD COMMENT
1
Entering edit mode

Yes, it is working.

Thank you for the quick response

ADD REPLY
0
Entering edit mode

No problem. Glad I could help.

ADD REPLY

Login before adding your answer.

Traffic: 2689 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6