Soft Clipping
2
2
Entering edit mode
10.2 years ago
Gregor Rot ▴ 540

I have short reads that end in a non-genomic sequence. Clipping the reads before mapping is possible but not ideal. The optimal thing for me is to use bowtie2 or STAR and allow soft clipping. This works, but the mapping is worse compared to mapping of pre-clipped reads.

I can't seem to find any option to allow "more" soft clipping. Is there any way i can make STAR/bowtie2 soft clip up to half the read (from 3' or 5' end)? Or do the mappers already do that? Where can i read on how soft clipping works?

Thanks, Gregor

bowtie2 • 11k views
ADD COMMENT
1
Entering edit mode

I think it is worth investigating what the mapping is worse means in your context. It is suspicious when cutting off ends of reads leads to "worse" mapping, although as I said, what the word "worse" means is essential.

The default expectation would be that after clipping more reads map overall and but the number of uniquely mapped reads decrease.

ADD REPLY
0
Entering edit mode

Exactly. Simply trimming the reads would increase the mapped %, because shorter reads are easier to align - however, more of them would be of low mapping quality, or just wrong.

Soft clipping is more sensitive than adapter removal, it's been shown pretty reliably I think.

ADD REPLY
0
Entering edit mode
10.2 years ago
Irsan ★ 7.8k

With STAR version 2.3.0 you can trim for example 10 bases from 3 or 5 prime end with options --clip3pNBases 10 and --clip5pNbases 10. Have a look at the manual, its all there

I dont know if bowtie2 also has a built in read trimmer but if not, there are tons of other tools available. One example is fastx-toolkit

ADD COMMENT
0
Entering edit mode

bowtie has it also

-5/--trim5 <int>
       Trim <int> bases from high-quality (left) end of each read before alignment (default: 0).

-3/--trim3 <int>  
       Trim <int> bases from low-quality (right) end of each read before alignment (default: 0).
ADD REPLY
0
Entering edit mode

Have you tested whether this works properly for paired-end reads?

ADD REPLY
0
Entering edit mode

I would like to point out that this is hard clipping you're referring to.

By definition, soft clipping is not done at some defined length - rather, it's simply a modification of the scoring scheme that does not punish for mismatches at the ends of the read.

ADD REPLY
0
Entering edit mode
10.2 years ago
Gregor Rot ▴ 540

Thanks for all the answers. If i clip reads before mapping (i remove a certain sequence from the 3') more reads map (i only consider uniquely mapped reads) compared to if i don't clip the reads before mapping. I presume this is because sometimes i have to remove even half the read from 3' (>40 nucleotides) and soft clipping doesn't consider more than a few nucleotides? I still can't find anywhere any documentation on how soft clipping is performed, neither for bowtie2 or STAR.

ADD COMMENT
0
Entering edit mode

The main problem is that only when aligning the reads to the genome i see how much clipping is necessary :) so i want to use soft-clipping to align the reads. But it seems that soft clipping works only for a few nucleotides? Of course i can pre-clip nucleotide by nucleotide the unmapped reads and then re-map but i was just thinking there is a more elegant way...

ADD REPLY

Login before adding your answer.

Traffic: 3434 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6