TCGA MAF (mutation annotation format) to EIGENSOFT's smartpca via PLINK (or: how to get SNP centiMorgan from TCGA MAFs)?
2.4 years ago

Hi there,

I want to do an eQTL-Analysis on TCGA cancer data. Currently, I am stuck at the point of creating covariate files for the genotype data: I have TCGA somatic MAF files (downloaded from GDC data portal, see documentation here: ) and want to use EIGENSOFT's smartpca on the data. However, I have to convert it to the correct format first (see ). While it seems that I can create most of the files on my own from the MAF, I get stuck at the SNP file, where I have to specify a SNP's genetic position in centiMorgans.

I therefore looked for tools that help me to automatically create these files and came across PLINK. According to related posts on Biostars (A: TCGA SNP data and TCGA SNP to plink ), I need to create a PED and MAP file and then use the --lfile operator to create a plink object. However, the MAP file also requires a centiMorgan position. They however note that the centiMorgan specification can be left out as it is crucial only for particular tasks.

Right now I am kind of confused on how to proceed: - Do I need to specify centiMorgans at all when using smartpca? - Is it reasonable to use plink to create input files for smartpca when I have to create MAP and and PED file on my own (which already correspond to the snp and indiv files required by EIGENSOFT/smartpca)? Will plink calculate the centiMorgan position of my SNPs even if I do not specify them in the MAP file?

I am grateful for any advice on how to proceed.



2.3 years ago

I finally found the solution by myself, so I post it here in case others are running into the same problem:

If you only want to use smartpca, there is no need to specify the genetic position in centimorgans. Just enter 0 and that will be fine.


