Exclude reads soft-clipped > N bases in Freebayes
1
0
Entering edit mode
5.4 years ago
nanana ▴ 110

I'm running Freebayes as follows:

freebayes -f genome.fa --pooled-discrete --genotype-qualities --read-mismatch-limit 70 \
tumour.bam normal.bam -v tn_raw.vcf


I want to filter out reads that are soft clipped (aligned with bwa mem) > 70 bps, and I'm using the --read-mismatch-limit 70 option to achieve this. However, in the output I see SNPS being called from reads with > 80 bps mis-aligned (see linked screenshot):

Is this option doing what I think it is? Is this the correct option to exclude reads with more than N mismatches?

next-gen snp • 2.0k views
1
Entering edit mode
5.4 years ago

There is a terminology issue here. In NGS analysis parlance soft clipping is not the equivalent to mismatching bases.

The soft clipped region are not aligned for matching or mismatching regions - these are removed altogether as the alignment score is better when the region is not present. So their alignment is not reported. You could have matching bases within a soft clipped region.

Therefore in general, in my opinion, the soft clipped regions of reads should not be used to call SNPs at all.

Hence I think this flag refers to the actual mismatching bases and not the soft clipped ones.

-U --read-mismatch-limit N
Exclude reads with more than N mismatches where each mismatch
has base quality >= mismatch-base-quality-threshold.
default: ~unbounded

0
Entering edit mode

OK makes sense - do you have any suggestions for how to exclude soft-clipped reads (other than removing them from the BAM in the first place)?

1
Entering edit mode

If you want to exclude soft-clipped reads, then why not remove them from the bam?

0
Entering edit mode

Because I need them included in another analysis, and I'd prefer not to have to take up twice as much space by filtering them just for Freebayes.

0
Entering edit mode

can you please test if freebayes can read sub-bash file ? something like

freebayes -f genome.fa --pooled-discrete --genotype-qualities --read-mismatch-limit 70 \
<(samtools view -bu tumour.bam) <(samtools view -bu normal.bam) -v tn_raw.vcf


just to know if you can put an inline filter in your command

1
Entering edit mode

Nice idea, but this doesn't work. I get the error: Failed to load index for /dev/fd/62. Rebuild samtools index