Question: Find out the mutation type from TCGA data
9 days ago by
Pin.Bioinf170 wrote:


I downloaded mutation data from TCGA from here: select SKCM >SNP6_copynum

and the data looks as follows:

    Sample  Chromosome  Start   End Num_Probes  Segment_Mean
TCGA-3N-A9WB-10A-01D-A38I-01    1   61735   534551  31  -0.3442
TCGA-3N-A9WB-10A-01D-A38I-01    1   564621  16345255    8476    0.0159
TCGA-3N-A9WB-10A-01D-A38I-01    1   16363940    16389566    8   0.8209

Is there a way I can find out which type of mutations these are? I tried looking at the SNP6 manifest and could not get that information either.

Thank you!!

Isn't this copy number data?

ADD REPLYlink written 9 days ago by Martombo2.3k

is it? I don't know, i thoutght it was both SNP and CNV data, as it said: SNP6_copynum in the folder containing the files. File name: SKCM.snp__genome_wide_snp_6__broad_mit_edu__Level_3__segmented_scna_hg19__seg.seg.txt

9 days ago by
Kevin Blighe30k
Republic of Ireland
Kevin Blighe30k wrote:

It is processed (by GISTIC 2.0, I believe) copy number data derived from Affymetrix SNP 6.0 microarray, which has probes that target both SNPs and copy number variants. It is a commonly used platform for determining genome-wide CNV profiles (I used it during my PhD in breast cancer).

Note that you may want to instead download the file with 'minus_germline_cnv' in its name, i.e., download it from Firebrowse.

If you want to process it, take a look here:

  1. Part I
  2. Part II
  3. Part III


Kevin, I have a question about this: I want to get SNP data for the melanoma project from TCGA, is there any way I can find it? Maybe the protected access files in TCGA have it? I can't seem to find that information.

You mean somatic mutations? They will be stored in the open access MAF files. If you want VCF files for all tumour and normals, then they are indeed protected and you will require authorisation.

Okay thank you, and if i want to request acces for the VCF files, I know I have to specify exactly the files I need: which ones should I ask access for? there are so many VCF files here

(What I want to do is relate these mutations to the clinical data from TCGA in melanoma, which I already have: I want to look into the mutations in those samples )

I see - sounds interesting. I know of at least 2 groups who are actively working in that area, so, you better hurry up! As I understand, a key part of the immune response revolves around knockout mutations in PD1 and PDL1. Can you not just use the MAF (Mutation Annotation Format) data and use that?

But I have read that processing MAF files is very complicated... I have never done it! Or can I open them in IGV? How can I match the mutation to the gene and the specific patient (sample) from a MAF file?

EDIT: I think I found a R package that allows to read the files (maftools). Thanks!

MAF files are not easy, I admit, but neither impossible. I believe that Cyriac Kandoth developed a MAF2VCF script, but I am not sure of its utility.

Are you prepared to wait possibly months to get dbGaP approval for accessing the GDC controlled access data? It involves a lot of admin work. I recently passed through it.

