Question: Exclude reads soft-clipped > N bases in Freebayes
0
gravatar for nr23
21 months ago by
nr2380
nr2380 wrote:

I'm running Freebayes as follows:

freebayes -f genome.fa --pooled-discrete --genotype-qualities --read-mismatch-limit 70 \
tumour.bam normal.bam -v tn_raw.vcf

I want to filter out reads that are soft clipped (aligned with bwa mem) > 70 bps, and I'm using the --read-mismatch-limit 70 option to achieve this. However, in the output I see SNPS being called from reads with > 80 bps mis-aligned (see linked screenshot):

Is this option doing what I think it is? Is this the correct option to exclude reads with more than N mismatches?

snp next-gen • 837 views
ADD COMMENTlink modified 21 months ago by Istvan Albert ♦♦ 80k • written 21 months ago by nr2380
1
gravatar for Istvan Albert
21 months ago by
Istvan Albert ♦♦ 80k
University Park, USA
Istvan Albert ♦♦ 80k wrote:

There is a terminology issue here. In NGS analysis parlance soft clipping is not the equivalent to mismatching bases.

The soft clipped region are not aligned for matching or mismatching regions - these are removed altogether as the alignment score is better when the region is not present. So their alignment is not reported. You could have matching bases within a soft clipped region.

Therefore in general, in my opinion, the soft clipped regions of reads should not be used to call SNPs at all.

Hence I think this flag refers to the actual mismatching bases and not the soft clipped ones.

-U --read-mismatch-limit N
         Exclude reads with more than N mismatches where each mismatch
         has base quality >= mismatch-base-quality-threshold.
         default: ~unbounded
ADD COMMENTlink modified 21 months ago • written 21 months ago by Istvan Albert ♦♦ 80k

OK makes sense - do you have any suggestions for how to exclude soft-clipped reads (other than removing them from the BAM in the first place)?

ADD REPLYlink written 21 months ago by nr2380
1

If you want to exclude soft-clipped reads, then why not remove them from the bam?

ADD REPLYlink modified 21 months ago • written 21 months ago by Brian Bushnell16k

Because I need them included in another analysis, and I'd prefer not to have to take up twice as much space by filtering them just for Freebayes.

ADD REPLYlink modified 20 months ago • written 21 months ago by nr2380

can you please test if freebayes can read sub-bash file ? something like

freebayes -f genome.fa --pooled-discrete --genotype-qualities --read-mismatch-limit 70 \
 <(samtools view -bu tumour.bam) <(samtools view -bu normal.bam) -v tn_raw.vcf

just to know if you can put an inline filter in your command

ADD REPLYlink written 21 months ago by Pierre Lindenbaum119k
1

Nice idea, but this doesn't work. I get the error: Failed to load index for /dev/fd/62. Rebuild samtools index

ADD REPLYlink modified 21 months ago • written 21 months ago by nr2380
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1120 users visited in the last hour