Question: Downstream analysis of VCF files obtained from VarScan2
0
gravatar for Raheleh
4 months ago by
Raheleh50
Raheleh50 wrote:

Hello,

I am sorry for the very basic and newbie question but I am confusing and getting lost in my thoughts. I have WES data of tumor samples with matched ones extracted from 13 patients (paired-end, illumina). I used BWA mem to align them against hg38, and used VarScan2 to call somatic variations. Now I have 6 files (fpfilter_Passed.vcf) for each patient (Somatic, LOH, Germline for each of snp and indel variations). My question is, what I have to do in this step? Should I merge all 6 files together?

Afterwards should I merge the VCF file of all patients together and then use SnpEff for downstream analysis?

This is the all commands that I used till this step:

#Trim using Trimmomatic
java -jar Trimmomatic-0.38/trimmomatic-0.38.jar PE normal_1_1.fastq normal_1_2.fastq -baseout sample_1 LEADING:30 TRAILING:30 MINLEN:50

#Align against ref using bwa
bwa index hg38.fa
bwa mem hg38.fa normal_1_1P normal_1_2P > normal_1.sam

#Convert sam to bam
samtools view -bS normal_1.sam > normal_1.bam
samtools sort normal_1.bam -o normal_1.sorted.bam
samtools index normal_1.sorted.bam

# Get the raw variants:
samtools mpileup -q 20 -Q 25 -B -d 1000 -f hg38.fa normal_1.sorted.bam tumor_1.sorted.bam | java -jar VarScan.v2.4.3.jar somatic /dev/stdin outputName -mpileup --strand-filter 1 --output-vcf

# Classify into Germ, LOH and Somatic:
java -jar VarScan.v2.4.3.jar processSomatic *.snp.vcf --max-normal-freq 0.01
java -jar VarScan.v2.4.3.jar processSomatic *.indel.vcf --max-normal-freq 0.01

#Preparing BED file
egrep -hv "^#" Germline.hc.vcf | awk 'OFS="\t" {print $1, $2-1, $2+1}' | sort -k1,1 -k2,2n | bedtools merge -i - > Germline.hc.bed


#Run bam-readcount:
bam-readcount -f hg38.fa -q 20 -b 25 -d 1000 -l Germline.hc.bed -w 1 normal_1.bam > Germline.hc.bamRC


#Run fpfilter for all somatics, germlines and LOHs for both snp and indel:

java -jar VarScan.v2.4.3.jar fpfilter outputName.snp.Germline.hc.vcf Germline.hc.bamRC --output-file Germline.hc.fpfilterPassed.vcf --filtered-file Germline.hc.fpfilterFailed.vcf
java -jar VarScan.v2.4.3.jar fpfilter outputName.snp.LOH.hc.vcf LOH.hc.bamRC --output-file LOH.hc.fpfilterPassed.vcf --filtered-file LOH.hc.fpfilterFailed.vcf
java -jar VarScan.v2.4.3.jar fpfilter outputName.snp.Somatic.hc.vcf Somatic.hc.bamRC --output-file Somatic.hc.fpfilterPassed.vcf --filtered-file Somatic.hc.fpfilterFailed.vcf
java -jar VarScan.v2.4.3.jar fpfilter outputName.indel.Germline.hc.vcf Germline.hc.bamRC --output-file indel.Germline.hc.fpfilterPassed.vcf --filtered-file indel.Germline.hc.fpfilterFailed.vcf
java -jar VarScan.v2.4.3.jar fpfilter outputName.indel.LOH.hc.vcf LOH.hc.bamRC --output-file indel.LOH.hc.fpfilterPassed.vcf –filtered-file indel.LOH.hc.fpfilterFailed.vcf
java -jar VarScan.v2.4.3.jar fpfilter outputName.indel.Somatic.hc.vcf Somatic.hc.bamRC --output-file indel.Somatic.hc.fpfilterPassed.vcf --filtered-file indel.Somatic.hc.fpfilterFailed.vcf

Can anyone help me out? Thanks!

ADD COMMENTlink modified 3 months ago • written 4 months ago by Raheleh50

Hello, samtools mpileup can merge multiple files, then use varscan to analyse, I don't know if this fits you.

ADD REPLYlink written 4 months ago by MatthewP80

Thank you MatthewP. But my question is should I merge the files of all 13 patients together? I mean at the end for downstream analysis, there must be only one VCF file? Or for every patient, vcf file should be analysed separately?

ADD REPLYlink written 3 months ago by Raheleh50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1073 users visited in the last hour