Entering edit mode
3.1 years ago
Nemo
•
0
I have some datasets related to different diseases like AML, etc. The data which I have include markers' genotypes (AA, AB, BB) and some other information like chromosome number, physical position, strand,etc. For some reasons I need to have the genetic positions of these snps as well. As I have searched so far, I could not find any exact solution for this (some suggested using 'MareyMap', but it also needs both physical position and genetic positions as input). Do you have any recommendation?
What does "genetic position" mean? Can you give us an example?
for using the 'ldetect' package (for identifying the independent linkage disequilibrium blocks), the format of input should be : snpID, physical position, (cumulative) genetic position. (I have almost all the data except the third one. )
you can find paper for ldetect here
I don't want to read a paper to understand a term. Can you explain it here?
EDIT: I looked at the tool's manual/README, and it seems like a multi-step process. Where in this process is the "genetic position" used as an input without the data being generated by a previous step?
exactly at the first step, for the first command you need to have a file as input with such information : snpID, physical position, (cumulative) genetic position. You can see the format of such file in the parameters of the first command with name 'example_data/chr2.interpolated_genetic_map.gz'.
I am guessing that the genetic position might have been the centiMorgan. One that I found is here: http://bochet.gcc.biostat.washington.edu/beagle/genetic_maps/ though you might want to make sure the build is correct.
I am not sure if the genetic positions of all snps in all populations are the same. Can you verify this? (to be more clear, I am skeptical if my samples in my AML dataset, have the same genetic positions as other population which are provided in your link.)
Each population has their own genetic map. As you've not stated your population, I just sent you the first one I found. Here is another one contain gentic map for different populations: https://github.com/joepickrell/1000-genomes-genetic-maps
Usually, it is ok to use the 1000G population map. If you want to calculate your own genetic map, you might need to use packages like LDhat or something like that.
Thanks Sam for your nice explanation, I would wonder if you let me know, since I do not have any family history of my population to infer the genetic position(distance), can I simply use the 1000G for any population? (Chinese, Asian,...)
Yes, as long as you pick one that is most closely resemble your population, then that should be fine.