I am working with som bam files generated from single cell sequencing. as part of a python script I need to run samtools version 1.17 filtering for several tags, ideally in a single commandline. So far I have:
cmd_final = f"samtools view -@ {cpus} -O BAM -e 'exists([CB]) && exists([UB]) && [CB]!=\"-\" && [UB]!=\"-\"' temp_filtered.bam > {te_bam}"
I have also tried:
cmd_final = f"samtools view -h -@ {cpus} -O BAM -e 'exists(CB) && exists(UB) && CB!=\"-\" && UB!=\"-\"' temp_filtered.bam > {output}"
but both result in the error:
ÄE::sam_passes_filterÅ Couldn't process filter expression
AndI've tried many other commandline variations. But I just can't seem to get it right.
I have checked my input file temp_filtered.bam and it DOES contain reads with these tags and they are NOT malformed. It seems like there are some subtle and important differences in the many versions of samtools' commandlines, which complicates my task. Could anyone point out what I'm doing wrong and suggest a fix? I would greatly, GREATLY appreciate the help, as this problem has taken entirely too much of my time and I'm getting quite frustrated with it.
despite your explanation I still think towards a data integrity issue ... (with/or an encoding issue, I've seen this ÄE pop up before but can't recall why or when ...)
Can you check data integrity ? or use a different file for testing?