**140**wrote:

Hello everyone, I am trying to compute correlation between depth of coverage for individuals with homozygote genotypes vs. individuals with heterozygote genotypes.

here is the fist few lines of my datasets (note, I have only one position across all individuals):

```
head(homozygotes)
chrom pos dp ind_id genotype_id
1 1 115258827 12 HG00099 0|0
3 1 115258827 8 HG00101 0|0
4 1 115258827 6 HG00103 0|0
8 1 115258827 2 HG00114 0|0
9 1 115258827 8 HG00115 0|0
12 1 115258827 8 HG00128 0|0
head(heterozygotes)
chrom pos dp ind_id genotype_id
2 1 115258827 5 HG00100 0|1
14 1 115258827 5 HG00133 0|1
16 1 115258827 5 HG00138 0|1
19 1 115258827 2 HG00160 1|0
27 1 115258827 4 HG00232 1|0
33 1 115258827 9 HG00251 1|0
```

these 2 datasets differ in length. Therefore,
when I simply try `cor.test(homozygotes$dp,heterozygotes$dp),`

I get an error message:

"Error in cor : incompatible dimensions.

I have searched to find a solution, but have not been able to find a solid solution. I am now quite stuck, does anyone have any idea how can I proceed and figure this out? I would sincerely appreciate it.

Thanks a lot

you need to match the columns. Down sample larger dataset.

11k