Hi everyone!
I'm very new in the study of variants from tumour vs normal samples. In this case, I'm working with WES data from esophageal adenocarcinoma using three tumoural samples and three normal samples from the same patient. For this purpose, I'm using Galaxy (the purpose of the activity is to run the analysis without command line) and the pipeline employed at this moment has been:
- QC of raw reads in fastq format (
FastQC
andMultiQC
) - Trimming of raw reads and QC of trimmed reads (
Trimmomatic
and the QC tools mentioned above) - Alignment of trimmed reads (
BWA-MEM
against hg19, in Set read groups information I employed set_picard and in read group sample name I employed Normal or Tumour) - Sort bam files (Respect to coordinates)
- Filter sorted bam files (To remove low QMap aligned reads)
- Mark of duplicates (
MarkDuplicates
) - Realignment (
BamLeftAlign
) - Recalibration of bam files (
CallMD
) - Final filter (To remove aligned and recalibrated reads with QMap greater than 254)
At the moment of using VarScan Somatic
to retrieve the variants from tumoural samples I had a question about how to match the bam files from step 9. Do I need to match tumour_sample_1 with normal_sample_1 and so on to run the analysis? or should I use the function of galaxy to select multiple samples from each group? Currently, I performed the last and got three vcf files from each tumoural sample.
Thanks in advance!
Thanks for your response. Particularly, the samples in the data set become from distinct regions, I mean normal samples were taken from a distinct site of tumour. In this setting, I followed your suggestion and ran in Galaxy the analysis matching T1 vs N1, T2 vs N2... and so on.
Best regards!