Aggregation and analysis of MAF files
0
0
Entering edit mode
16 months ago
weisekyle • 0

Hello all,

I have cancer genomic data (tumor/normal whole exome sequencing) from 50 patients that received the same type of treatment, half of whom responded. These come in the form of 50 .maf files, along with a supplemental file that, along with some other fields, has the Response (Responder vs Non-Responder) field. My question is how to aggregate all of this together so that I can perform a statistical test on the data. I have a clever way of reading in the 50 .maf files and combining them, but I wonder if this is an appropriate approach.

sample_info <- readr::read_tsv(file = "path/to/sample_info/sample-information.tsv")
maf_files <- fs::dir_ls("path/to/mafs/")
patients_data <- maf_files %>%
  purrr::map_dfr(read_tsv, col_types = list(Chromosome = col_character()))

My thought then was to just dplyr::left_join() the sample_info with patients_data like

patients_data_final <- patients_data %>% dplyr::left_join(sample_info, by = c("Tumor_Sample_Barcode", "Matched_Norm_Sample_Barcode"))

For clarity, here are the column names of both dataframes

> colnames(patients_data)
 [1] "Hugo_Symbol"                 "Chromosome"                  "Start_position"             
 [4] "End_position"                "Variant_Classification"      "Variant_Type"               
 [7] "Reference_Allele"            "Tumor_Seq_Allele1"           "Tumor_Seq_Allele2"          
[10] "Tumor_Sample_Barcode"        "Matched_Norm_Sample_Barcode" "Protein_Change"             
[13] "t_alt_count"                 "t_ref_count"                

> colnames(sample_info)
[1] "Patient_ID"                     "Tumor_Sample_Barcode"           "Matched_Norm_Sample_Barcode"   
[4] "Response"                       "Silent_mutations_per_Mb"        "Nonsynonymous_mutations_per_Mb"
[7] "Mutations_per_Mb"

My task is to find out "whether there are any specific mutations that are observed more in responders vs non-responders." So as a supplemental question, if anyone has suggestions on which statistical test to use (or how to go about deciding), I'd appreciate that as well.

PS: I am aware of the maftools package which probably has an easy solution to this, but unfortunately my machine is old (Late 2011 MacBook Pro) and unable to run it. (Old Mac --> Can't update OS --> Cant update version of R --> Can't install necessary packages to run maftools)

maf cancer genomics R • 348 views
ADD COMMENT

Login before adding your answer.

Traffic: 2482 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6