Question

Working with multiple bam files from tumour vs normal samps in VarScan Somatic

0

Entering edit mode

4.2 years ago

rodolfo.peacewalker ▴ 390

Hi everyone!

I'm very new in the study of variants from tumour vs normal samples. In this case, I'm working with WES data from esophageal adenocarcinoma using three tumoural samples and three normal samples from the same patient. For this purpose, I'm using Galaxy (the purpose of the activity is to run the analysis without command line) and the pipeline employed at this moment has been:

QC of raw reads in fastq format (FastQC and MultiQC)
Trimming of raw reads and QC of trimmed reads (Trimmomatic and the QC tools mentioned above)
Alignment of trimmed reads (BWA-MEM against hg19, in Set read groups information I employed set_picard and in read group sample name I employed Normal or Tumour)
Sort bam files (Respect to coordinates)
Filter sorted bam files (To remove low QMap aligned reads)
Mark of duplicates (MarkDuplicates)
Realignment (BamLeftAlign)
Recalibration of bam files (CallMD)
Final filter (To remove aligned and recalibrated reads with QMap greater than 254)

At the moment of using VarScan Somatic to retrieve the variants from tumoural samples I had a question about how to match the bam files from step 9. Do I need to match tumour_sample_1 with normal_sample_1 and so on to run the analysis? or should I use the function of galaxy to select multiple samples from each group? Currently, I performed the last and got three vcf files from each tumoural sample.

Thanks in advance!

Variants DNA-seq Galaxy VCF • 1.3k views

ADD COMMENT • link 4.2 years ago by rodolfo.peacewalker ▴ 390

score 0 · Answer 1 · 2021-05-04

0

Entering edit mode

4.2 years ago

heskett ▴ 110

In general using the matched normal is going to be the best option, and using a panel of unrelated normals is used as a backup in case you don't have matched normals. Why? The point of mutation calling is to discern the difference between tumor and normal within an individual

ADD COMMENT • link 4.2 years ago by heskett ▴ 110

0

Entering edit mode

Thanks for your response. Particularly, the samples in the data set become from distinct regions, I mean normal samples were taken from a distinct site of tumour. In this setting, I followed your suggestion and ran in Galaxy the analysis matching T1 vs N1, T2 vs N2... and so on.

Best regards!

ADD REPLY • link 4.2 years ago by rodolfo.peacewalker ▴ 390