Question: Genome Music - Pathway File
6.6 years ago
The University of Sheffield
mark.dunning200 wrote:

Hi all,

I am interested in using the Genome Music tool from WashU to find mutated genes in my cancer dataset.

One of the parameters to the 'play' command is pathway-file which is documented as "Tab-delimited file of pathway information"

However, I have no idea what this file is supposed to be, or where I can get one from

Anyone have any idea?



ADD COMMENT
6.6 years ago
The University of Sheffield
mark.dunning200 wrote:

Ok, I found the answer by looking at the help page for the path-scan function

--pathway-file This is a tab-delimited file prepared from a pathway database (such as KEGG), with the columns: [pathid, pathname, class, geneline, diseases, drugs, description] The latter three columns are optional (but are available on KEGG). The geneline contains the "entrezid:genename" of all genes involved in this pathway, each separated by a "|" symbol. For example, a line in the pathway-file would look like:

  hsa00061    Fatty acid biosynthesis    Lipid Metabolism    31:ACACA|32:ACACB|27349:MCAT|2194:FASN|54995:OXSM|55301:OLAH

Ensure that the gene names and entrez IDs used match those used in the MAF file. Entrez IDs are not mandatory (use a 0 if Entrez ID unknown). But if a gene name in the MAF does not match any gene name in this file, the entrez IDs are used to find a match (unless it's a 0).

It doesn't really say how to "prepare" such a file though?

ADD COMMENT

Hi mark.dunning

Can you please share how you prepared the --pathway-file.

ADD REPLY
6.6 years ago
London, UK
Giovanni M Dall'Olio26k wrote:

There are more information in the man genome music page:

   The MuSiC suite is a set of tools aimed at discovering the significance of somatic mutations found within a given cohort of cancer samples, and with  respect to a variety of external data sources. The standard inputs required  are:

   1. mapped reads in BAM format
   2. predicted or validated SNVs or indels in mutation annotation format (MAF)
   3. a list of regions of interest (typically the boundaries of coding exons)
   4. any relevant numeric or categorical clinical data.

   The formats for inputs 3. and 4. are:

   3. Regions of Interest File:
       ·   Do not use headers

       ·   4 columns, which are [chromosome  start-position(1-based)  stop-position(1-based)  gene_name]

   4. Clinical Data Files:
       ·   Headers are required

       ·   At least 1 sample_id column and 1 attribute column, with the format being [sample_id  clinical_data_attribute clinical_data_attribute  ...]

       ·   The sample_id must match the sample_id listed in the MAF under "Tumor_Sample_Barcode" for relating the mutations of this sample.

       ·   The header for each clinical_data_attribute will appear in the output file to denote relationships with the mutation data from the MAF.

   Descriptions for the usage of each tool (each sub-command) can be found separately.

   The play command runs all of the sub-commands serially on a selected input set.

According to the description, the pathway file should be something like:

chr1 9999 12000 gene1_exon1
chr1 12999 15000 gene1_exon2
chr2 999 1200 gene1_exon1

I don't have files to test this, but it should work fine. Eventually, you may have to remove "chr" from the chromosome names.

ADD COMMENT

"chr1 9999 12000 gene1_exon1" is the format of the MuSiC roi-file, not for the pathway-file.

ADD REPLY
