I am currently working with genomic data from bottlenose dolphins generated through ddRAD sequencing. I'm currently filtering the biallelic SNP data. Right now I'm looking at linkage disequilibrium using vcftools to calculate the R^2 value between two SNPs. The output is straightforward with three columns for the positions of the SNPs being tested, one column for the number of individuals sharing this SNP and the corresponding R^2 value:
CHR POS1 POS2 N_INDV R^2
NW_017842120.1 105522 1040442 133 0.00322423
NW_017842131.1 28704 680111 149 0.0187932
NW_017842131.1 28704 947810 115 0.00176315
NW_017842131.1 28704 1877729 143 0.00398911
NW_017842131.1 28704 2027166 132 0.00220472
NW_017842131.1 28704 2484385 160 0.00376784
NW_017842131.1 28704 3216074 161 0.000317002
NW_017842131.1 28704 3300206 160 0.00379328
NW_017842131.1 28704 3378162 157 0.00476654
and in the course of this several questions popped up:
Should I calculate an adjusted R^2 value for large numbers of observations or does vcftools already provide the adjusted value? Didn't find any pointers in the vcftools manual as to how the R^2 value is being calculated except that it's done in a similar fashion as in PLINK.
In order to test for significance I can simply calculate the F-statistic with the corresponding dfs (df1 = 1 and df2 = n - 1)?
The resulting p-values from the F-test will likely have to be adjusted (Bonferroni or Holm). Should I do this per scaffold or across all SNPs? I'm assuming per scaffold since SNPs on different 'chromosomes'/scaffolds weren't tested against each other?