Aggregation and analysis of MAF files
Entering edit mode
16 months ago
weisekyle • 0

Hello all,

I have cancer genomic data (tumor/normal whole exome sequencing) from 50 patients that received the same type of treatment, half of whom responded. These come in the form of 50 .maf files, along with a supplemental file that, along with some other fields, has the Response (Responder vs Non-Responder) field. My question is how to aggregate all of this together so that I can perform a statistical test on the data. I have a clever way of reading in the 50 .maf files and combining them, but I wonder if this is an appropriate approach.

sample_info <- readr::read_tsv(file = "path/to/sample_info/sample-information.tsv")
maf_files <- fs::dir_ls("path/to/mafs/")
patients_data <- maf_files %>%
  purrr::map_dfr(read_tsv, col_types = list(Chromosome = col_character()))

My thought then was to just dplyr::left_join() the sample_info with patients_data like

patients_data_final <- patients_data %>% dplyr::left_join(sample_info, by = c("Tumor_Sample_Barcode", "Matched_Norm_Sample_Barcode"))

For clarity, here are the column names of both dataframes

> colnames(patients_data)
 [1] "Hugo_Symbol"                 "Chromosome"                  "Start_position"             
 [4] "End_position"                "Variant_Classification"      "Variant_Type"               
 [7] "Reference_Allele"            "Tumor_Seq_Allele1"           "Tumor_Seq_Allele2"          
[10] "Tumor_Sample_Barcode"        "Matched_Norm_Sample_Barcode" "Protein_Change"             
[13] "t_alt_count"                 "t_ref_count"                

> colnames(sample_info)
[1] "Patient_ID"                     "Tumor_Sample_Barcode"           "Matched_Norm_Sample_Barcode"   
[4] "Response"                       "Silent_mutations_per_Mb"        "Nonsynonymous_mutations_per_Mb"
[7] "Mutations_per_Mb"

My task is to find out "whether there are any specific mutations that are observed more in responders vs non-responders." So as a supplemental question, if anyone has suggestions on which statistical test to use (or how to go about deciding), I'd appreciate that as well.

PS: I am aware of the maftools package which probably has an easy solution to this, but unfortunately my machine is old (Late 2011 MacBook Pro) and unable to run it. (Old Mac --> Can't update OS --> Cant update version of R --> Can't install necessary packages to run maftools)

maf cancer genomics R • 348 views

Login before adding your answer.

Traffic: 2482 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6