Question: Downstream analysis of VCF files obtained from VarScan2
gravatar for Raheleh
4 months ago by
Raheleh50 wrote:


I am sorry for the very basic and newbie question but I am confusing and getting lost in my thoughts. I have WES data of tumor samples with matched ones extracted from 13 patients (paired-end, illumina). I used BWA mem to align them against hg38, and used VarScan2 to call somatic variations. Now I have 6 files (fpfilter_Passed.vcf) for each patient (Somatic, LOH, Germline for each of snp and indel variations). My question is, what I have to do in this step? Should I merge all 6 files together?

Afterwards should I merge the VCF file of all patients together and then use SnpEff for downstream analysis?

This is the all commands that I used till this step:

#Trim using Trimmomatic
java -jar Trimmomatic-0.38/trimmomatic-0.38.jar PE normal_1_1.fastq normal_1_2.fastq -baseout sample_1 LEADING:30 TRAILING:30 MINLEN:50

#Align against ref using bwa
bwa index hg38.fa
bwa mem hg38.fa normal_1_1P normal_1_2P > normal_1.sam

#Convert sam to bam
samtools view -bS normal_1.sam > normal_1.bam
samtools sort normal_1.bam -o normal_1.sorted.bam
samtools index normal_1.sorted.bam

# Get the raw variants:
samtools mpileup -q 20 -Q 25 -B -d 1000 -f hg38.fa normal_1.sorted.bam tumor_1.sorted.bam | java -jar VarScan.v2.4.3.jar somatic /dev/stdin outputName -mpileup --strand-filter 1 --output-vcf

# Classify into Germ, LOH and Somatic:
java -jar VarScan.v2.4.3.jar processSomatic *.snp.vcf --max-normal-freq 0.01
java -jar VarScan.v2.4.3.jar processSomatic *.indel.vcf --max-normal-freq 0.01

#Preparing BED file
egrep -hv "^#" Germline.hc.vcf | awk 'OFS="\t" {print $1, $2-1, $2+1}' | sort -k1,1 -k2,2n | bedtools merge -i - > Germline.hc.bed

#Run bam-readcount:
bam-readcount -f hg38.fa -q 20 -b 25 -d 1000 -l Germline.hc.bed -w 1 normal_1.bam > Germline.hc.bamRC

#Run fpfilter for all somatics, germlines and LOHs for both snp and indel:

java -jar VarScan.v2.4.3.jar fpfilter outputName.snp.Germline.hc.vcf Germline.hc.bamRC --output-file Germline.hc.fpfilterPassed.vcf --filtered-file Germline.hc.fpfilterFailed.vcf
java -jar VarScan.v2.4.3.jar fpfilter outputName.snp.LOH.hc.vcf LOH.hc.bamRC --output-file LOH.hc.fpfilterPassed.vcf --filtered-file LOH.hc.fpfilterFailed.vcf
java -jar VarScan.v2.4.3.jar fpfilter outputName.snp.Somatic.hc.vcf Somatic.hc.bamRC --output-file Somatic.hc.fpfilterPassed.vcf --filtered-file Somatic.hc.fpfilterFailed.vcf
java -jar VarScan.v2.4.3.jar fpfilter outputName.indel.Germline.hc.vcf Germline.hc.bamRC --output-file indel.Germline.hc.fpfilterPassed.vcf --filtered-file indel.Germline.hc.fpfilterFailed.vcf
java -jar VarScan.v2.4.3.jar fpfilter outputName.indel.LOH.hc.vcf LOH.hc.bamRC --output-file indel.LOH.hc.fpfilterPassed.vcf –filtered-file indel.LOH.hc.fpfilterFailed.vcf
java -jar VarScan.v2.4.3.jar fpfilter outputName.indel.Somatic.hc.vcf Somatic.hc.bamRC --output-file indel.Somatic.hc.fpfilterPassed.vcf --filtered-file indel.Somatic.hc.fpfilterFailed.vcf

Can anyone help me out? Thanks!

ADD COMMENTlink modified 3 months ago • written 4 months ago by Raheleh50

Hello, samtools mpileup can merge multiple files, then use varscan to analyse, I don't know if this fits you.

ADD REPLYlink written 4 months ago by MatthewP80

Thank you MatthewP. But my question is should I merge the files of all 13 patients together? I mean at the end for downstream analysis, there must be only one VCF file? Or for every patient, vcf file should be analysed separately?

ADD REPLYlink written 3 months ago by Raheleh50
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1073 users visited in the last hour