Genome Music - Pathway File
2
3
Entering edit mode
10.2 years ago
mark.dunning ▴ 230

Hi all,

I am interested in using the Genome Music tool from WashU to find mutated genes in my cancer dataset.

One of the parameters to the 'play' command is pathway-file which is documented as "Tab-delimited file of pathway information"

However, I have no idea what this file is supposed to be, or where I can get one from

Anyone have any idea?

Regards,

Mark

• 2.8k views
ADD COMMENT
3
Entering edit mode
10.2 years ago
mark.dunning ▴ 230

Ok, I found the answer by looking at the help page for the path-scan function

--pathway-file This is a tab-delimited file prepared from a pathway database (such as KEGG), with the columns: [pathid, pathname, class, geneline, diseases, drugs, description] The latter three columns are optional (but are available on KEGG). The geneline contains the "entrezid:genename" of all genes involved in this pathway, each separated by a "|" symbol. For example, a line in the pathway-file would look like:

  hsa00061    Fatty acid biosynthesis    Lipid Metabolism    31:ACACA|32:ACACB|27349:MCAT|2194:FASN|54995:OXSM|55301:OLAH

Ensure that the gene names and entrez IDs used match those used in the MAF file. Entrez IDs are not mandatory (use a 0 if Entrez ID unknown). But if a gene name in the MAF does not match any gene name in this file, the entrez IDs are used to find a match (unless it's a 0).

It doesn't really say how to "prepare" such a file though?

ADD COMMENT
0
Entering edit mode

Hi mark.dunning

Can you please share how you prepared the --pathway-file.

ADD REPLY
0
Entering edit mode
10.2 years ago

There are more information in the man genome music page:

   The MuSiC suite is a set of tools aimed at discovering the significance of somatic mutations found within a given cohort of cancer samples, and with  respect to a variety of external data sources. The standard inputs required  are:

   1. mapped reads in BAM format
   2. predicted or validated SNVs or indels in mutation annotation format (MAF)
   3. a list of regions of interest (typically the boundaries of coding exons)
   4. any relevant numeric or categorical clinical data.

   The formats for inputs 3. and 4. are:

   3. Regions of Interest File:
       ·   Do not use headers

       ·   4 columns, which are [chromosome  start-position(1-based)  stop-position(1-based)  gene_name]

   4. Clinical Data Files:
       ·   Headers are required

       ·   At least 1 sample_id column and 1 attribute column, with the format being [sample_id  clinical_data_attribute clinical_data_attribute  ...]

       ·   The sample_id must match the sample_id listed in the MAF under "Tumor_Sample_Barcode" for relating the mutations of this sample.

       ·   The header for each clinical_data_attribute will appear in the output file to denote relationships with the mutation data from the MAF.

   Descriptions for the usage of each tool (each sub-command) can be found separately.

   The play command runs all of the sub-commands serially on a selected input set.

According to the description, the pathway file should be something like:

chr1 9999 12000 gene1_exon1
chr1 12999 15000 gene1_exon2
chr2 999 1200 gene1_exon1

I don't have files to test this, but it should work fine. Eventually, you may have to remove "chr" from the chromosome names.

ADD COMMENT
1
Entering edit mode

"chr1 9999 12000 gene1_exon1" is the format of the MuSiC roi-file, not for the pathway-file.

ADD REPLY

Login before adding your answer.

Traffic: 1744 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6