Question: TCGA SNP analysis using manhattan plot and QQ plot
0
gravatar for DanielC
14 months ago by
DanielC90
Canada
DanielC90 wrote:

Dear Friends,

I am new to TCGA data analysis. I would really appreciate your suggestions on these questions:

a) From TCGA vcf files, I am looking to generate manhattan plots and qq plots to detect the association of SNPs with the traits? I know to generate manhattan plots we need these info:

CHR: chromosome (aliases chr, chromosome)
BP: nucleotide location (aliases bp, pos, position)
SNP: SNP identifier (aliases snp, rs, rsid, rsnum, id, marker, markername)
P: p-value for the association (aliases p, pval, p-value, pvalue, p.value)

"CHR", "BP, "SNP" are in the vcf files, so where to get the "P-value" from?

And for QQ plots also where to get the observed and expected p-value?

b) What type of plot should be generated to best present the number of variants for each tumor in each cancer type in vcf files? Could you please let me know where to find the information of tumor and the cancer type for SNPs in vcf files?

Thank you very much! DK

plots snp tcga • 873 views
ADD COMMENTlink modified 14 months ago • written 14 months ago by DanielC90

As far as I know, TCGA does not calculate association p-values. Although there may be independent resources where that information is available.

ADD REPLYlink modified 14 months ago • written 14 months ago by igor8.1k

Thanks! can you let me know where the p-values can be obtained from for each SNPs?

ADD REPLYlink written 14 months ago by DanielC90

Igor was just saying that such data may exist... somewhere. I have never seen such data, but it could exist. One resource that may have something similar is cBioPortal.

May I ask what you are trying to do? Manhattan plots were mainly used for GWAS, not cancer data. Of course, the can be used to plot anything. I believe that we have already identified the mutational landscape of tumours (?)

ADD REPLYlink written 14 months ago by Kevin Blighe47k

Thanks! Yes, am doing GWAS study, and I have vcf files to perform the above mentioned studies. Now, am trying to plot manhattan plot and QQ plot to detect the association of SNPs with the traits. Since am lacking p-values for the SNPs I am not able to plot them. Please let me know if am clear and if you know how one can proceed with these plots?

ADD REPLYlink written 14 months ago by DanielC90

So, you need to know how to perform an association test from the VCF stage? What I would do is convert the data into plink format, and then do the association testing there. I have done this man times in the past, in fact.

Another program, SnpSift CaseControl, can perform the testing and encode the p-values within your VCF, which may be easier for you.

ADD REPLYlink written 14 months ago by Kevin Blighe47k

Thanks! Yes, so, I need to use plink to get the p-values of the SNPs from the vcf files, right? Could you please guide me to the plink steps source where I could learn on how to perform this? I am new to this. Thanks for understanding.

ADD REPLYlink modified 14 months ago • written 14 months ago by DanielC90
1

Sure, you just need the --vcf flag: How unphased VCF is converted into ped file?

However, when doing this, plink apparently distorts the order of the samples in your VCF. So, you should 'fix' the ordering of your samples from the very first step and then supply a custom FAM file or all analyses. I cannot stress enough how important this is because otherwise you will be comparing sample groups that are not reflective of the actual groupings that you want.

What I said may not make much sense right now, but just go step by step and be 100% certain at each step that what you believe is happening is happening. It's easy to convert any VCF to plink, but not easy to maintain sample groupings.

See here: linkage disequilibrium analysis

ADD REPLYlink written 14 months ago by Kevin Blighe47k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1524 users visited in the last hour