Soft-clipping primer sequences in targeted sequencing: approaches and tools
1
1
Entering edit mode
4.1 years ago
lamteva.vera ▴ 210

Hi folks,

I'm analyzing paired-end TruSeq Custom Amplicon panel data so I need to get rid of primer sequences (ULSO and DLSO probes) prior to variant calling. As far as I know the preferable way to do it is soft-clipping. Please advise the proper tools for this purpose. Which ones do you prefer and why?

I have found only 2 such tools that use different approaches: GATK's ClipReads and BamClipper. GATK's ClipReads can soft-clip sequence by exact match, while BamClipper uses genomic coordinates of primer sequences. I guess, the second approach is more accurate because it is not affected by possible base calling issues that could introduce erroneous bases in the ULSO/DLSO sequences. Correct me if I'm wrong.

What do you think about those approaches for handling primer sequences when analyzing targeted panels?

So far, I've tried soft-clipping primer sequences with BamClipper but somehow it has introduced errors into the BAM files, and I'm still waiting to get an answer from the developer at GitHub.

Thanks for your time!

UPDATE for those interested in the topic - I've found more tools: cutPrimers (but for now it can be run on FASTQ files only); Katana that uses the same principle as BamClipper; PcrClipReads.

bamclipper soft-clipping PcrClipReads Katana • 2.4k views
0
Entering edit mode

Hi,

I am facing similar issues. Can you please share your latest updates and how you resolved it. Also, would it be possible to explain how to get a BEDPE file from Probes in the manifest file(I am not sure if all the fields mentioned in the bedtools link are present http://bedtools.readthedocs.io/en/latest/content/general-usage.html#bedpe-format ),

Target Region Name  Target Region ID    Target ID   Species Build ID    Chromosome  Start Position  End Position    Submitted Target Region Strand  ULSO Sequence   ULSO Genomic Hits   DLSO Sequence   DLSO Genomic Hits   Probe Strand    Designer    Design Score    Expected Amplifed Region Size   SNP Masking Labels
MPL1_2  MPL1_2.chr1.43815008.43815009   MPL1_2.chr1.43815008.43815009_tile_1    Homo sapiens    hg19    chr1    43815008    43815009    -   AAGTGGCGAAGCCGTAGGTGCGCACG  0   TCAGCAGCAGCAGGCCCAGGACGG    0   -   ILLUMINA    NA  183 TRUE    1


I would really appreciate your inputs.

In case this is useful, I am using bwa for alignment with default options, samtools and scripts for indel re-alignment and variant calling using varscan.

Thanks and Regards, Pramod

0
Entering edit mode
4.1 years ago
Paul ★ 1.4k

Hi a have very similar question - please check my thread with some good post and comments. Some people using BBMap suite. I will be happy if you share some of your experience either.

1
Entering edit mode

Paul, have a look at this thread.

1
Entering edit mode

Paul, I'm now trying to choose between BamClipper, Katana and PcrClipReads. GATK's ClipReads is not an appropriate tool for soft-clipping primers - I've found the same opinion here. If I haven't mess everything up, the aformentioned tools use the same approach: they soft-clip primer sequences based on their genomic coordinates. UPDATE: PcrClipReads is for non-overlapping amplicons only (this thread may be helpful), while Katana and BamClipper can handle those.

0
Entering edit mode

Thank you for information. I will check maybe all programs too. We can at least share our experiences here - maybe it helps somebody else :)

0
Entering edit mode

How exactly you would use BBMap suite for soft-clipping primer sequences? I haven't found such an option. P.S. I've read your post - there is my comment at the bottom of the page. ;)

1
Entering edit mode

bbmap.sh should soft clip the primers when it aligns data. Have you tested alignments with it?

0
Entering edit mode

No, I have used BWA MEM for alignment. Good to know, I should give it a try.

0
Entering edit mode

Yeah I see, thank you for comment. I am still did not get right solution for this task. I hope someone maybe share more experiences in your thread. Did you already test GATK ClipReads versus BamClipper?