Question

Why does this samtools command work?

0

Entering edit mode

5.9 years ago

c_u ▴ 520

Hi,

I am trying to map fastq files to bam format, and I am using the pipline mentioned here - https://github.com/ArimaGenomics/mapping_pipeline/blob/master/Arima_Mapping_UserGuide.pdf.

One of the commands in the pipeline is -

perl $COMBINER $FILT_DIR/$SRA\_1.bam $FILT_DIR/$SRA\_2.bam $SAMTOOLS $MAPQ_FILTER | $SAMTOOLS view -bS -t $FAIDX - | $SAMTOOLS sort -o $TMP_DIR/$SRA.bam -

Here, $COMBINER is a code that combines 2 bam files, $FILT_DIR, and $TEMP_DIR are folders, $SRA is the name of the file, $SAMTOOLS is the location of the samtools executable, $MAPQ_FITLER is 10, and $FAIDX is where the reference .fai file is.

My question is - how is $SAMTOOLS $MAPQ_FILTER working here? From what I have read, samtools mapq filter is supposed to work like samtools view -bq 10 , but $SAMTOOLS $MAPQ_FILTER is basically just samtools 10, so I am unable to understand why this works.

Any help would be great!

RNA-Seq samtools • 1.7k views

ADD COMMENT • link updated 5.9 years ago by h.mon 35k • written 5.9 years ago by c_u ▴ 520

score 3 · Accepted Answer · 2018-07-09

3

Entering edit mode

5.9 years ago

h.mon 35k

You are not following deep enough. The issue here is you are looking at a bash script, calling a perl script.

$SAMTOOLS $MAPQ_FILTER is not a command, it is not being executed as samtools 10. These are two bash variables, which are being passed as arguments to $COMBINER, which is another bash variable, but turns out to be the perl script two_read_bam_combiner.pl.

Inside $COMBINER, I mean, inside two_read_bam_combiner.pl, the value of the bash variable $MAPQ_FILTER is assigned to the perl variable $mq. Still, $mq is not used as argument to samtools, because samtools is called with:

open(FILE1,"$samtools view -h $read1_bam |");
open(FILE2,"$samtools view -h $read2_bam |");

$mq is used here:

my $trouble = 0;
if (($binary1[2] == 1) || ($mapq1 < $mq)) {
    $trouble = 1;
}
if (($binary2[2]== 1) || ($mapq2 < $mq)) {
    $trouble = 1;
}

Down the line, $trouble is used for filtering:

unless ($trouble == 1) {
    print(join("\t",$id1,$new_flag1,$chr_from1,$loc_from1,$mapq1,$cigar1,$chr_from2,$loc_from2,$dist1,$read1,$read_qual1,@rest1) . "\n");
    print(join("\t",$id2,$new_flag2,$chr_from2,$loc_from2,$mapq2,$cigar2,$chr_from1,$loc_from1,$dist2,$read2,$read_qual2,@rest2) . "\n");
}

ADD COMMENT • link 5.9 years ago by h.mon 35k

0

Entering edit mode

Got it, so basically the perl script is doing both - merging the 2 bam files, and simultaneously filtering based on map quality. So, if I had to get rid of the merging of the bam files, and still filter the 2 (R1 and R2) bam files, then I can bypass this perl script altogether and do the filtering using the regular samtools view -bq 10 on both, right?

ADD REPLY • link 5.9 years ago by c_u ▴ 520

1

Entering edit mode

I am not sure, because I don't know what you intend to do.

You have to be careful because the perl script is keeping the reads from both bam files "paired", i.e., the first read of the first bam file forms a pair with the first read of the second bam file, and so on. The perl script guarantees if a read from one of the bam files fails the filter, the corresponding read from the other bam will be filtered out too.

If you filter each bam independently with samtools, the bam files may become "unpaired".