how to split a coordinate-sorted bam file by read group
2.8 years ago

Hi, I have a coordinate-sorted bam file, and its read group information in the header is:

I want to extract only the reads with ID:0, and I tried commands:

samtools view -b bam -r '@RG\tID:0\tPL:ILLUMINA\tSM:COLO829_Normal_Tgen\tPU:H0CGCADXX:1:none' ~/mixted.bam > rg_0.bam


and

samtools split -f '@RG\tID:0\tPL:ILLUMINA\tSM:COLO829_Normal_Tgen\tPU:H0CGCADXX:1:none' ~/mixted.bam > rg_0.bam


I just got the header information in the output bam file.

Can someone help me with this? Thank you!

sequencing
2.8 years ago

Hello jing.mengrabbit,

you are using the command incorrect. You must pass the ID value of the the RG line to samtools view.

\$ samtools view -b -r 0 ~/mixted.bam > rg_0.bam


samtools split have no option to just get one read group. It creates a new file for each read group it finds.

samtools split [options] merged.sam|merged.bam|merged.cram

Splits a file by read group.

Options:

-u FILE1
Put reads with no RG tag or an unrecognised RG tag into FILE1

-u FILE1:FILE2
As above, but assigns an RG tag as given in the header of FILE2

-f STRING
Output filename format string (see below) ["%*_%#.%."]

-v
Verbose output


fin swimmer

fin swimmer