Question: Is it necessary to mark the duplicates, realign the data and recalibrate base quality score (VarScan2)?
0
gravatar for Raheleh
16 months ago by
Raheleh140
Raheleh140 wrote:

Hello,

I have WES data of tumor samples with matched ones (paired-end, illumina). I trimmed them using trimmomatic (LEADING: 30, TRAILING:30, MINLEN:50) and aligned them against hg38 using bwa mem. I wann use VarScan2 to call somatic and germline variants. Is it necessary to mark the duplicates, realign the data and recalibrate base quality score before using VarScan2?

As I found mark duplicates and realign indels are not necessary, right?

I’d really appreciate any help!

ADD COMMENTlink modified 16 months ago by ATpoint32k • written 16 months ago by Raheleh140
4
gravatar for ATpoint
16 months ago by
ATpoint32k
Germany
ATpoint32k wrote:

As VarScan2 depends on samtools mpileup which allows specifying minimum requirements on base- and mapping quality, no strict filtering is IMHO required. Marking duplicates is a good and accepted option, removing them is unneccessary as mpileup ignores them if flagged appropriately. I prefer samblaster for on-the-fly marking of duplicates. It may or may not save you from some false-positives where PCR has over-amplified certain fragments. There is literature out there that shows that the overall effect is minimal. My preferred pipeline is basically:

bwa mem ${BWA_IDX} in_1.fastq.gz in_2.fastq.gz | \
  samtools fixmate -m -O SAM - - | \
  samblaster --ignoreUnmated | \
  sambamba view -f bam -S -l 0 -o /dev/stdout /dev/stdin | \
  sambamba sort --tmpdir=./ -l 5 -o out_sorted /dev/stdin

This gives you a sorted and duplicate-marked BAM file without any intermediate files.

As for VarScan2, also see this VarScan2 publication (but mind that is is quiet old and some options might be deprecated). Base recalibration and realignment is not explicitly recommended for VarScan and has not (to my knowledge) been shown to be truely beneficial, especially considering the computational expenses. There is quiet some literature on this available. I do not personally use it.

ADD COMMENTlink modified 16 months ago • written 16 months ago by ATpoint32k

Note that I edited the post and removed the /dev/stdin from the line with the BWA command, sorry was there by mistake.

ADD REPLYlink written 16 months ago by ATpoint32k

Thanks ATpoint. I really appreciate if you answer my question here, as well?

ADD REPLYlink modified 16 months ago • written 16 months ago by Raheleh140
2
gravatar for finswimmer
16 months ago by
finswimmer13k
Germany
finswimmer13k wrote:

Hello R.A. ,

you will find all type of opinions about these steps. Here are mine ;):

  • trimming your data isn't neccessary as long as you overall basequalitys are fine. You will throw away to much data that is usable, especially if the paired reads overlap
  • mark duplicates is fine, even if the impact to the whole dataset might be small. There are always region that are prefered amplified.
  • realign data and recalibrate base quality are method recommended in the best practice guidelines by GATK. The programs use for that are optimized to work together with the other programs provided by GATK. It isn't said, that it is useful when using other variant caller. Especially the impact of recalibrate the base quality is very low if you haven't low complexity data.

fin swimmer

ADD COMMENTlink written 16 months ago by finswimmer13k

Many Thanks fin swimmer for your suggestion and quick reply. I used this command to remove the duplicates samtools rmdup sample.sorted.bam sample_rmdup.bam Is it fine or I just have to mark them not remove?

ADD REPLYlink written 16 months ago by Raheleh140

You're welcome.

Whether to remove or mark duplicates depends on your own attitude. I'm not a fan of removing anything from my alignment file. So I just mark them.

fin swimmer

ADD REPLYlink written 16 months ago by finswimmer13k

Got it. Thanks!

Dear fin swimmer, could you please answer my question here. I really need help. Many thanks in advance!

ADD REPLYlink modified 16 months ago • written 16 months ago by Raheleh140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1486 users visited in the last hour