How to convert input files from PLINK to snp.plotter?
1
0
Entering edit mode
4.7 years ago

I want to convert my pedmap (or bed) files into the format required by snp.plotter (R package to create plots of p-values using single SNP and/or haplotype data). The documentation is terrible, so here is what I got.

## The SNP file

From the doc:

"SNP.FILE includes four necessary columns ASSOC, SNP.NAME, LOC, and SS.PVAL corresponding to positive or negative association (indicating protective or susceptibility alleles, a SNP label, the location, and a p-value for each SNP"

Columns 2 and 3 could be taken from .MAP file directly. The SS.PVAL could be obtained after running PLINK with --hardy.

But how about the ASSOC (+ or -)? Where to get the negative or positive association data for each SNP?

## The HAP file

From the doc:

HAP.FILE: HAP.FILE includes three necessary columns ASSOC, G.PVAL, and I.PVAL corresponding to positive or negative association (indicating protective or susceptibility alleles, a global p-value and an individual p-value for each haplotype followed by a set of columnns of SNPs with corresponding haplotypes. Haplotypes are presented in a step-wise fashion with the major allele given as 1 and the minor allele as 2; haplotype variants for a set of SNPs should be grouped. SNP labels in HAP.FILE must be the same as in SNP.FILE, and only SNPs with corresponding haplotypes need to be included. In the figure, unfilled symbols connected by solid lines are used to indicate global haplotype p-values, (a circle is used if no symbol is specified for the dataset). Unfilled and filled symbols are used to indicate alleles 1 and 2, respectively connected by solid lines and dashed lines for positive and negative association (indicating susceptibility or protective haplotypes) when using indivudal haplotype p-values.

How they get the Global PVAL and Individual PVAL for haplotypes?

Also for each SNP in columns they put

• Major allele = 1
• Minor allele = 2
• Nothing otherwise

If this info matches the .hwe I don't get it, how they recode the A1 and A2 columns?

## The GENOTYPE file

From the doc:

GENOTYPE.FILE: GENOTYPE.FILE is a modified Linkage PED file. Each row should have the following information: family ID, individual ID, father ID, mother ID, sex, and affection status followed by marker loci coded as binary factors

I guess this could be obtained after a --recode12 from the PED.

1
Entering edit mode
4.7 years ago

The documentation it's not bad at all. I think you need to read it more carefully. Just by a quick glance at the link that you're showing, looks like this program is used to do regional Manhattan plots. A really easy web-based application is LocusZoom: http://locuszoom.sph.umich.edu/ What you won't have is that Haploview-type plot, but LocusZoom plots shows a LD map together with your SNPs.

For your package: Since, it is a package to do Manhattan plots, it will depend on what you wanna show. You wanna show which SNPs are associated to your trait? You will need the SNP file.Or do you wanna show which haplotypes are associated with it? You will need the HAP file. The genotype file is not mandatory (just read it on the documentation)

If you check the documentation in PLINK, both output files can be easily obtained, in the first case you can do a standard GWAS by SNPs or a GWAS by haplotypes. You can use the output files to plot it then using your package.

0
Entering edit mode

You are free to call stupid to anyone, but my question is about how to interpret the data they ask for the input file (my questions are pretty clear IMHO).

The documentation says the PALETTE file is optional, they don't say anything optional about the GENOTYPE file.

A dataset is composed a configuration file, a SNP and haplotype file for each result set, one genotype file, and an optional palette file

AFAIK the web LocusZoom only allows to select against the Human genome builds, I am not working with Human species. Anyway, I am interested in the HaploView-like plot.

1
Entering edit mode

I wasn't calling you anything, just telling you that you could have looked more carefully into the documentation. In R do ?snp.plotter, you'll see that for a "standard" Manhattan plot, you only need the SNP FILE. I can help you convert your output file from plink to the input file for snp.plotter if you tell me what type of association you're conduction. Is it for a quantitative trait? For instance --linear ? In that case you're doing a regression analysis and PLINK generates an output file named *.assoc. Since you wanna plot the result from this as a Manhattan plot you need the P-values from this analysis, not from a HW test as you proposed. The ASSOC files refers to the sign of the beta coefficient in this regression, in the *.assoc file this is the BETA column. All the other information that you will need to use is there. Hope this helps!

0
Entering edit mode

Now that's a better answer :).

I am not doing any association since I don't have traits (at least yet, I don't do experiment design). I am exploring if snp.plotter visualizations could be nice alternative to HaploView and other haplotype visualization packages.

As far as I can see, the mandatory SNP.FILE values are obtained by association analysis,if you don't do association then you cannot use this package (if that's true then that's the reason why I call documentation terrible, they should made clear that to not waste the people's time).

0
Entering edit mode

If you're interested in just plotting an LDmap I would recommend then: https://cran.r-project.org/web/packages/LDheatmap/LDheatmap.pdf