Question: Finding somatic and germline variations in tumor samples with matched ones (paired-end, illumina)
gravatar for Raheleh
4 months ago by
Raheleh50 wrote:

Hello, I am new to the field of NGS data analysis and currently analyzing WES data from tumor samples with matched ones (paired-end, illumina). I am using linux command to analyze the data. This is what I did till now for each sample:

fastqc sample.fastq
java -jar trimmomatic-0.38.jar PE sample_1.fastq sample_2.fastq -basedout sample LEADING:30 TRAILING:30 MINLEN:50
bowtie2-build hg38.fa hg38
bowtie2 -x hg38 -1 sample_1P -2 sample_2P -S sample.sam
samtools view -bS sample.sam > sample.bam
samtools sort sample.bam -o sample.sorted.bam
samtools mpileup -uf hg38.fa sample.sorted.bam > sample.mpileup

I don’t know after this step what is the reasonable step to take? I am keen on finding somatic and germline variations. I am using varscan, however I am confused. Shall I use “ java -jar VarScan.jar somatic normal.pileup tumor.pileup “? what is different between pileup and mpileup file?

Any help will be very appreciated. Thanks

mpileup file wes data varscan • 293 views
ADD COMMENTlink modified 4 months ago by ATpoint14k • written 4 months ago by Raheleh50
gravatar for ATpoint
4 months ago by
ATpoint14k wrote:

A couple of things: First, I would change from bowtie2 to BWA mem because most variant calling pipelines assume BWA as the aligner. Second, you can shorten your commands by using pipes like align (options...) | samtools sort -o sorted.bam -. This will save time and disk space. Third, given that you start a new project, consider to use a more recent variant caller than VarScan2. There is nothing wrong with VarScan2 but it is no longer maintained which is why I personally switched to strelka2 from Illumina recently. If you still want to use VarScan2, you might have a look at my pipeline at Github for it. It is an admittedly ugly script but you can use it to get an idea how the VarScan2 subcommands are to be used. It starts by calling raw variants using mpileup/varscan2 somatic, extracts germline and somatic high confidence variants with processSomatic and then applies the recommended heuristic fpfilter to remove potential junk calls. Still, I encourage you to use a more recent caller like strelka2, which has also has more complete documentation, making the start into the variant field easier for you.

ADD COMMENTlink modified 4 months ago • written 4 months ago by ATpoint14k

Dear ATpoint, many thanks for your explanations!

ADD REPLYlink written 4 months ago by Raheleh50

Dear ATpoint,

when I run strelka2 for somatic calling I get this error: Can't find expected fasta index file: index_bwa/hg38.fa.fai

This is my script: strelka-2.9.2.centos6_x86_64/bin/ --normalBam BC.bam --tumorBam XL.bam --ref index_bwa/hg38.fa --runDir demo_somatic

Do you know what the problem is?

ADD REPLYlink modified 4 months ago • written 4 months ago by Raheleh50
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 763 users visited in the last hour