how to split a coordinate-sorted bam file by read group
1
0
Entering edit mode
2.8 years ago

Hi, I have a coordinate-sorted bam file, and its read group information in the header is:

I want to extract only the reads with ID:0, and I tried commands:

samtools view -b bam -r '@RG\tID:0\tPL:ILLUMINA\tSM:COLO829_Normal_Tgen\tPU:H0CGCADXX:1:none' ~/mixted.bam > rg_0.bam


and

samtools split -f '@RG\tID:0\tPL:ILLUMINA\tSM:COLO829_Normal_Tgen\tPU:H0CGCADXX:1:none' ~/mixted.bam > rg_0.bam


I just got the header information in the output bam file.

Can someone help me with this? Thank you!

sequencing • 2.7k views
2
Entering edit mode
2.8 years ago

Hello jing.mengrabbit,

you are using the command incorrect. You must pass the ID value of the the RG line to samtools view.

\$ samtools view -b -r 0 ~/mixted.bam > rg_0.bam


samtools split have no option to just get one read group. It creates a new file for each read group it finds.

samtools split [options] merged.sam|merged.bam|merged.cram

Splits a file by read group.

Options:

-u FILE1
Put reads with no RG tag or an unrecognised RG tag into FILE1

-u FILE1:FILE2
As above, but assigns an RG tag as given in the header of FILE2

-f STRING
Output filename format string (see below) ["%*_%#.%."]

-v
Verbose output


fin swimmer

0
Entering edit mode

1
Entering edit mode

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.

fin swimmer