Question: TCGA SNP data
gravatar for archie
2.6 years ago by
archie100 wrote:

Dear all

In one of my project, I have to use the SNPs from TCGA (PAAD) and convert them into plink format and use for further analysis. There are many issues , that I faced

  1. I was not able to capture all the sites while convering .maf to .vcf. Although I used exactly same genome version as mentioned in .maf build reference genome information.
  2. I worked on rest of sites and convert .maf to .vcf files and processed in PLINK. Now issues were lots of missing data at individual level , end up with no outcome.

Queries :

  1. Which data (for SNPs or Mutation) I should exactly start to work on the tumour and normal samples profiles in PAAD ? Is it .maf or Copy number data or GWAS ? I found many datatypes files Can anyone suggest here ?

I will appreciate all the suggestions

Thank you


tcga paad snps maf2vcf • 997 views
ADD COMMENTlink modified 2.6 years ago by Kevin Blighe71k • written 2.6 years ago by archie100
gravatar for Kevin Blighe
2.6 years ago by
Kevin Blighe71k
Republic of Ireland
Kevin Blighe71k wrote:


You do not have to convert MAF to VCF for the purposes of input to PLINK. The MAF format was an 'unfortunate' development.

Read the PLINK documentation for creating a custom PED and MAP file, and then you will be able to create your PLINK dataset. You already have all of the information that you need in the MAF file.


ADD COMMENTlink written 2.6 years ago by Kevin Blighe71k

Dear kevin

In one of project , converted vcf to plink and performed downstream analysis. Therefore I thought of following the same strategy for TCGA as well . For "custom PED and MAP", ya i will check. Thanks for your suggestion


ADD REPLYlink written 2.6 years ago by archie100

Here is information on PED and MAP

By the way, if you still want to use MAF -> VCF -> PLINK, then you should create yor on custom FAM file, and then specify this in every PLINK command with the --FAM flag.

When converting from VCF -> PLINK, there is no wa for plink to know what are your phenotypes.

ADD REPLYlink written 2.6 years ago by Kevin Blighe71k


My problem is during conversion from maf to vcf, loosing many SNP sites or data information via by use of, which is not normal. I tried to fixed it, but not succeeded. Now I will follow your suggestion and will create custom PED and MAP. I already created .fam for dataset.

Thanks again


ADD REPLYlink written 2.6 years ago by archie100

Could you send information on some of the variants that are being filtered out? Also, can you link me to the specific MAF file on the GDC that you are using?

ADD REPLYlink written 2.6 years ago by Kevin Blighe71k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2348 users visited in the last hour