Question: samtools merge respecting original read groups
0
gravatar for abascalfederico
3.4 years ago by
abascalfederico1.1k
Spain
abascalfederico1.1k wrote:

Hi,

I'm struggling to merge many bam files respecting the original read groups with samtools. Some of the input files have the same read group, other files have others. If I don't use "-c" each "duplicate" RGs are assigned a random suffix (which I don't want to, they are the same RG). If I use "-c", RGs are not altered, but then only one RG line is printed in the header.

Does any one have a clue on how to solve this?

Ideally, with "-c" samtools should print any RG header line that is different from the others in the merged bam.

Many thanks Federico

samtools read groups • 1.6k views
ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by abascalfederico1.1k

Which version of samtools are you using? I recall that there were modifications to handle this relatively recently.

ADD REPLYlink written 3.4 years ago by Devon Ryan92k

Thanks for the feedback, Devon. I am using Version: 1.2 (using htslib 1.2.1). I have looked at the latest version (1.3) but don't see anything about this.

I've read in other post that for this kind of situation picard-tools is the right way to go. Wish samtools change this.

ADD REPLYlink written 3.4 years ago by abascalfederico1.1k
1
gravatar for abascalfederico
3.4 years ago by
abascalfederico1.1k
Spain
abascalfederico1.1k wrote:

In case it is useful for someone, I've written a perl routine to solve this problem.

sub create_header {
my $p_input_bams = $_[0];
my $header_file  = $_[1];
my $RG_string = "";
my %rgs;
foreach my $b ( @{$p_input_bams} ) {
    open(J,"samtools view -H $b|") || die "cñakjdñakj $b\n";
    while(<J>) {
        #print STDERR "$_";
        last unless(/^\@/);
        if(/^\@RG/) {
            $_ =~ s/\#[0-9]+//;
            $rgs{$_} = 1;
        }
    }
    close(J);
}
foreach my $rg ( keys %rgs ) {
    $RG_string .= $rg;
}
print STDERR "RG_string=\n$RG_string\n";
my $main_bam = $p_input_bams->[0];
open(IB, "samtools view -h $main_bam |") || die "Cannot open main: $main_bam\n";
open(OB, ">$header_file") || die "cannot write to header: $header_file\n";
while(<IB>) {
    last unless(/^\@/);
    if(!/^\@RG/) {
        print OB;
    } else {
        print STDERR "Printing RG string...\n";
        print OB $RG_string;
    }

}
close(OB);
close(IB);

}

To use it:

system("samtools merge -cf $lib.merge-changeHeader.bam @input_bams");
&create_header(\@input_bams,$header_file);
system("samtools reheader $header_file $lib.merge-changeHeader.bam > $lib.merge.bam");
ADD COMMENTlink written 3.4 years ago by abascalfederico1.1k

I assume this part is lost in unicode translation cñakjdñakj :-)

ADD REPLYlink written 3.4 years ago by genomax74k

hehehe my "die" error messages are usually random typings. It's fast and helps a lot debugging errors

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by abascalfederico1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1259 users visited in the last hour