Soft-Clipping And Variant Calling
2
1
Entering edit mode
10.1 years ago
DoubleDecker ▴ 180

Having aligned my reads with BWA-mem, I noticed that a lot of alignments are dodgy, to put it mildly, e.g. with both ends massively soft-clipped. I now wonder if I need to filter my alignments to get rid of such soft-clipped alignments before performing variant calling or (preferably!) variant callers such as GATK, freebayes consider soft-clipping information in deciding whether to call a variant?

bwa gatk • 11k views
ADD COMMENT
1
Entering edit mode
10.1 years ago

No, soft/hard clipped alignments are ignored by the callers.

EDIT: but hard and soft clip can be used to detect structural variations.

EDIT2: a quick test (added 8 bases in 5' and 3 ' to all reads. Only 5 snps found)

ADD COMMENT
0
Entering edit mode

That's really great, will save me a lot of disk space not to prepare yet another set of bam files.

EDIT . OK, I am scratching my head right now... so you only got only 5 additional SNPs having added an identical string to all the reads? But it seems to me you assigned low Phred quality values (2) to the added nucleotides, so SNP caller might be biased against them for that reason?

ADD REPLY
0
Entering edit mode
10.1 years ago
Mick ▴ 30

Forgive me if I have this wrong, but....

What happens in bwa mem is that often one gets the primary alignment, and lots of much shorter secondary alignments, many of which are hard/soft clipped. These secondary alignments are essentially local/split alignments wrt to the read, and are often not what SNP discovery projects are interested in.

GATK ignores everything except the primary alignment.

However, where reads do not have a good, long primary alignment, I am guessing it's possible that the short/clipped/local alignment is marked as the primary alignment and is taken into consideration by GATK when SNP calling.

For example, if one has PhiX in the reads, but not in the reference database, then one can get short/clipped/local alignments of the PhiX reads against the reference genome. These are clearly false alignments. However, they may be marked as primary alignments, and GATK may use them to call SNPs.

I think that was the focus of the question....

ADD COMMENT
0
Entering edit mode

Yes, that's what I meant. I did not use the option to output alternative alignments in BWA mem.

ADD REPLY
0
Entering edit mode

I'd go back and re-do alignments as per http://bio-bwa.sourceforge.net/

With BWA-MEM/BWA-SW, my tools are complaining about multiple primary alignments. Is it a bug? It is not. Multi-part alignments are possible in the presence of structural variations, gene fusion or reference misassembly. However, representing multi-part alignments in SAM has not been finalized. To make BWA work with your tools, please use option -M to flag extra hits as secondary.

ADD REPLY

Login before adding your answer.

Traffic: 1850 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6