Question: Is it necessary to mark the duplicates, realign the data and recalibrate base quality score (VarScan2)?
0
gravatar for Raheleh
10 months ago by
Raheleh90
Raheleh90 wrote:

Hello,

I have WES data of tumor samples with matched ones (paired-end, illumina). I trimmed them using trimmomatic (LEADING: 30, TRAILING:30, MINLEN:50) and aligned them against hg38 using bwa mem. I wann use VarScan2 to call somatic and germline variants. Is it necessary to mark the duplicates, realign the data and recalibrate base quality score before using VarScan2?

As I found mark duplicates and realign indels are not necessary, right?

I’d really appreciate any help!

ADD COMMENTlink modified 10 months ago by ATpoint24k • written 10 months ago by Raheleh90
4
gravatar for ATpoint
10 months ago by
ATpoint24k
Germany
ATpoint24k wrote:

As VarScan2 depends on samtools mpileup which allows specifying minimum requirements on base- and mapping quality, no strict filtering is IMHO required. Marking duplicates is a good and accepted option, removing them is unneccessary as mpileup ignores them if flagged appropriately. I prefer samblaster for on-the-fly marking of duplicates. It may or may not save you from some false-positives where PCR has over-amplified certain fragments. There is literature out there that shows that the overall effect is minimal. My preferred pipeline is basically:

bwa mem ${BWA_IDX} in_1.fastq.gz in_2.fastq.gz | \
  samtools fixmate -m -O SAM - - | \
  samblaster --ignoreUnmated | \
  sambamba view -f bam -S -l 0 -o /dev/stdout /dev/stdin | \
  sambamba sort --tmpdir=./ -l 5 -o out_sorted /dev/stdin

This gives you a sorted and duplicate-marked BAM file without any intermediate files.

As for VarScan2, also see this VarScan2 publication (but mind that is is quiet old and some options might be deprecated). Base recalibration and realignment is not explicitly recommended for VarScan and has not (to my knowledge) been shown to be truely beneficial, especially considering the computational expenses. There is quiet some literature on this available. I do not personally use it.

ADD COMMENTlink modified 10 months ago • written 10 months ago by ATpoint24k

Note that I edited the post and removed the /dev/stdin from the line with the BWA command, sorry was there by mistake.

ADD REPLYlink written 10 months ago by ATpoint24k

Thanks ATpoint. I really appreciate if you answer my question here, as well?

ADD REPLYlink modified 10 months ago • written 10 months ago by Raheleh90
2
gravatar for finswimmer
10 months ago by
finswimmer12k
Germany
finswimmer12k wrote:

Hello R.A. ,

you will find all type of opinions about these steps. Here are mine ;):

  • trimming your data isn't neccessary as long as you overall basequalitys are fine. You will throw away to much data that is usable, especially if the paired reads overlap
  • mark duplicates is fine, even if the impact to the whole dataset might be small. There are always region that are prefered amplified.
  • realign data and recalibrate base quality are method recommended in the best practice guidelines by GATK. The programs use for that are optimized to work together with the other programs provided by GATK. It isn't said, that it is useful when using other variant caller. Especially the impact of recalibrate the base quality is very low if you haven't low complexity data.

fin swimmer

ADD COMMENTlink written 10 months ago by finswimmer12k

Many Thanks fin swimmer for your suggestion and quick reply. I used this command to remove the duplicates samtools rmdup sample.sorted.bam sample_rmdup.bam Is it fine or I just have to mark them not remove?

ADD REPLYlink written 10 months ago by Raheleh90

You're welcome.

Whether to remove or mark duplicates depends on your own attitude. I'm not a fan of removing anything from my alignment file. So I just mark them.

fin swimmer

ADD REPLYlink written 10 months ago by finswimmer12k

Got it. Thanks!

Dear fin swimmer, could you please answer my question here. I really need help. Many thanks in advance!

ADD REPLYlink modified 10 months ago • written 10 months ago by Raheleh90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2655 users visited in the last hour