Hi,
I am trying to map fastq files to bam format, and I am using the pipline mentioned here - https://github.com/ArimaGenomics/mapping_pipeline/blob/master/Arima_Mapping_UserGuide.pdf.
One of the commands in the pipeline is -
perl $COMBINER $FILT_DIR/$SRA\_1.bam $FILT_DIR/$SRA\_2.bam $SAMTOOLS $MAPQ_FILTER | $SAMTOOLS view -bS -t $FAIDX - | $SAMTOOLS sort -o $TMP_DIR/$SRA.bam -
Here, $COMBINER is a code that combines 2 bam files, $FILT_DIR, and $TEMP_DIR are folders, $SRA is the name of the file, $SAMTOOLS is the location of the samtools executable, $MAPQ_FITLER is 10, and $FAIDX is where the reference .fai file is.
My question is - how is $SAMTOOLS $MAPQ_FILTER
working here? From what I have read, samtools mapq filter is supposed to work like samtools view -bq 10
, but $SAMTOOLS $MAPQ_FILTER
is basically just samtools 10
, so I am unable to understand why this works.
Any help would be great!
Got it, so basically the perl script is doing both - merging the 2 bam files, and simultaneously filtering based on map quality. So, if I had to get rid of the merging of the bam files, and still filter the 2 (R1 and R2) bam files, then I can bypass this perl script altogether and do the filtering using the regular
samtools view -bq 10
on both, right?I am not sure, because I don't know what you intend to do.
You have to be careful because the perl script is keeping the reads from both bam files "paired", i.e., the first read of the first bam file forms a pair with the first read of the second bam file, and so on. The perl script guarantees if a read from one of the bam files fails the filter, the corresponding read from the other bam will be filtered out too.
If you filter each bam independently with samtools, the bam files may become "unpaired".
Thanks for that explanation, I did not know that before!