TCGA --- MAF files: no values showing for case_id column
1
0
Entering edit mode
4.5 years ago

Hi,

I have dowloaded some MAF files form the GDC using the GDC client, and I need the case_id numbers so that I can merge the MAF file with another file also containing case_id numbers. The documentation on the GDC says the case_id column is column 116, but when I try to extract or visualize that column, I can see the header, but I can't see anything else. Any advice?

TCGA cancer WXS GDC GDC-client • 1.6k views
ADD COMMENT
0
Entering edit mode
4.5 years ago

Instead, please use the following columns for the purpose of identifying samples:

  • Tumor_Sample_Barcode (#16)
  • Matched_Norm_Sample_Barcode (#17)
  • Tumor_Sample_UUID (#33)
  • Matched_Norm_Sample_UUID (#34)

Kevin

ADD COMMENT
0
Entering edit mode

Hi Kevin,

I am trying to match the MAF files to the associated clinical data files so that I can get the gender information to match the MAF file information. The clinical data files do not have any of the columns you listed unfortunately. They have the case_id and submitter_id only. Do you know if I can find either files with the MAF information plus the gender, or a file that would contain the gender along with the case or submitter IDs.?

Thanks, Sam

ADD REPLY
0
Entering edit mode

Can you please confirm the format of the 'submitter_id'? - these are short TCGA barcodes, right (like this: TCGA-E9-A1NA-11A)? In this case, you should be able to match these to the MAF files.

ADD REPLY
0
Entering edit mode

Here is an example of the submitter_id "TCGA-4Z-AA7N" and I can't find a matching column in the MAF file.

Thanks, Sam

ADD REPLY
0
Entering edit mode

The Tumor_Sample_Barcode and Matched_Norm_Sample_Barcode are not similar to this format?

ADD REPLY
0
Entering edit mode

it seems as if it they would partially match and since I am trying to use linux 'join' to merge the file based on column matches, I am not sure if that would be sufficient .

Thanks, Sam

ADD REPLY
0
Entering edit mode

You can use both of these 'barcodes' to match and to achieve what you want. The longer barcodes just contain some extra information that is not exactly required in this situation. Take a look: Meaning letters in TCGA sample barcode field

You do not have to use join to achieve what you want.

ADD REPLY
0
Entering edit mode

Kevin,

Thanks for your help so far. Without 'join' how could I merge the 2 files to get all the information matched in one file for analysis? The files have 10s of thousands of lines, so I can't manually match them. I am new to this, I would really appreciate some guidance how to accomplish that without using join.

When I sort your suggested columns the tumor_sample_barcode one ends up having a lot of the lines marked as "somatic", so they can't be matched either.

ADD REPLY
0
Entering edit mode

Yes, do not worry. Please tell me the ultimate aim of your work so that I can understand the desired end-format?

The clinical data that you have will has, I believe, 1 row per sample; whereas, the MAF file has hundreds or thousands of rows per sample (one row per each somatic mutation). Do you want a final table that has the same number of rows as the MAF file but with extra columns for the clinical data?

What are you comfortable using? - Python?; R?; JAVA?; shell scripting?

ADD REPLY
0
Entering edit mode

Hi Kevin,

Thanks for all your help so far. Roughly, I am trying to figure out which proportion of the mutations are associated with males versus female patients, so ideally a final table that has the same number of row as the maf file but with extra columns for the clinical data(mainly the gender). I am comfortable with Python and shell scripting.

I am also considering re-downloading the files from the GDC site filtered by gender using the GDC client in the command line. DO you think this could be an efficient way of doing this?

Thanks, Sam

ADD REPLY
0
Entering edit mode

Hey Sam, re-downloading it based on sex/gender may indeed be the easiest option in this case.

ADD REPLY
1
Entering edit mode

Thank you very much for taking the time to help me.

Sam

ADD REPLY

Login before adding your answer.

Traffic: 1706 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6