Question: GWAS analysis on merge TCGA exome data (vcf files)
1
gravatar for DanielC
11 months ago by
DanielC80
Canada
DanielC80 wrote:

Dear Friends,

I have to perform GWAS analysis such as manhattan and QQ plots on merged TCGA exome data vcf files; for example, on a file named "merge_exon_chr7.vcf.gz". Could you please tell me if performing GWAS on merge vcf files is different from individual TCGA exome vcf files? And if there is any difference in the results of GWAS performed on merged vcf files? I am relatively new to this so your suggestions will be highly appreciated.

Thanks, DK

plots snp gwas • 593 views
ADD COMMENTlink written 11 months ago by DanielC80

Depends on the location from which you are obtaining your data. Where have you obtained them? Many third-party websites re-process the TCGA data and use different filtering criteria.

ADD REPLYlink written 11 months ago by Kevin Blighe42k

Thanks! this is the real patieint data obtained from third party. The format is same to vcf files. So, I can perform the manhattan plot and QQ plot analysis on these merged vcf files per chromosome? And the result will be for each chromosome then? Does it sound right, if not please let me know your suggestions.

Also, I am looking to plot "number of SNPs for each tumour in each cancer type" in the vcf files. What type pf plot you think could be best for this? Thanks.

ADD REPLYlink written 11 months ago by DanielC80

Which third party? You should review the documentation for this third party to see how they processed the data, and then you should make a decision if the data is suitable for your experiment (or not).

It would be more appropriate to lot the 'mutation rate' per tumour. However, remember that the TCGA tumour data is just bulk tumour, therefore, they comprise many tumour clones.

ADD REPLYlink written 11 months ago by Kevin Blighe42k

Thanks! I have looked at the vcf file, the format is same as the vcf format in general like this http://www.internationalgenome.org/wiki/Analysis/vcf4.0/

For the plot, can you please be more specific about what columns I should get from the vcf file to plot the "number of SNPs for each tumour in each cancer type"?

ADD REPLYlink written 11 months ago by DanielC80

Thanks Kevin! I have done research and these are the figures I am looking to generate to find the "number of SNPs for each tumour in each cancer type":

http://agscientific.com/blog/2014/11/dna-sequencing-of-cancer-what-have-we-learned-so-far/

In the link "figure 2" and "figure 4", could you please tell me how vcf files could be used to generate these plots? I mean what data from VCF file is needed to generate these plots? Thanks!

ADD REPLYlink written 11 months ago by DanielC80

I see. Figure 2 looks like it was produced by somebody with a lot of skills. I could do it, but it would require a lot of manual coding.

Using the VCF, to determine the mutation rate per megabase (mutation / Mb), you just have to cunt the mutations per megabase across all chromosomes. It's something that most likely requires manual coding, as I mentioned. The lower part of that figure i then just distinguishing between the different types of base transitions. These 'signatures' have come to be regarded as important in cancer, in particular, as they are reflective of different types of DNA damage and other mutational processes, which can different across tumours.

Do you have bioinformatics expertise at your institute?

ADD REPLYlink written 11 months ago by Kevin Blighe42k

Thanks Kevin! Great Help! I now understand the concept. I have bioinformatics skills but am new to these type of GWAS analysis. I need to know what data I need to extract from the VCF file to plot these plots. To start with "Figure 4", practically speaking, to calculate the "number of SNPs/mutations across different cancer types", I see the "x-axis" to have "cancer types" and "mutation data", and "y-axis" to have "Alteration frequency" which I guess means frequency of SNPs/mutations. Could you please tell me how to calculate the "Alteration frequency" on the "y-axis" and how to extract the "cancer type" and "mutation/SNP data" from VCF file? Thanks much!

ADD REPLYlink written 11 months ago by DanielC80

To generate such a plot from the VCF stage would require a lot of going back and forth, or some comprehensive tutorial. These plots typically require much customisation and 'tweaking'. It may be more intuitive for you to reach out to a local collaborator. As you're in Canada, there should be many nearby (assuming you're around one of the main hubs)

ADD REPLYlink written 11 months ago by Kevin Blighe42k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1721 users visited in the last hour