Question

How To Analyse Snp Data From Different Sources?

3

Entering edit mode

13.1 years ago

John Bodovsky ▴ 30

I have 3 tables with SNPs and corresponding diseases. How can I get some statistics on them: which are in all lists, which are unique, and if possible make Euler's circles graph. Additionally, it would be good to find some tool for making graphs like this:

alt text

snp comparison visualization • 3.1k views

ADD COMMENT • link updated 10.1 years ago by Biostar 20 • written 13.1 years ago by John Bodovsky ▴ 30

3

Entering edit mode

aside from the graphical representation this looks to me like a very simple matching script. what is exactly keeping you from coding it? a better description of the input may probably help R experts to suggest great visualization scripts for your data.

ADD REPLY • link 13.1 years ago by Jorge Amigo 14k

score 5 · Answer 1 · 2011-11-23

In order to find which SNPs are common to all groups, you could use a tool like bedtools or joinx to intersect the lists, assuming you have them in BED or VCF files. This also assumes that your SNP lists are all on the same reference.

In order to find the common SNPs between all of the lists try:

joinx intersect -a list1.bed -b list2.bed -o common_to_lists_1_2.bed 
joinx intersect -a list3.bed -b common_to_lists_1_2.bed -o intersection.bed

The resulting intersection.bed gives you those SNPs common to list1,list2, and list3. Then if you want to find the total number of recurring SNPs, run one more set of intersections:

joinx intersect -a list1.bed -b list3.bed -o common_to_lists_1_3.bed
joinx intersect -a list2.bed -b list3.bed -o common_to_lists_2_3.bed

Now find the difference between the 1-2, 1-3, and 2-3 lists and the 1-2-3 list:

joinx intersect -a common_to_lists_1_2.bed -b intersection.bed --miss-a lists_1_2.bed
joinx intersect -a common_to_lists_1_3.bed -b intersection.bed --miss-a lists_1_3.bed
joinx intersect -a common_to_lists_2_3.bed -b intersection.bed --miss-a lists_2_3.bed

and your lists__.bed will contain all those SNPs which are shared by 2 lists. To find the total number of SNPs found on more than one list:

wc -l lists_?_?.bed intersection.bed

score 2 · Answer 2 · 2011-11-23

Jorge offers a good point. On the other hand, LD (linkage disequilibrium) could be used to "match" different SNPs that tag each other. So, before you perform the simple match with a script, you should collect LD data, say at r^2 = 0.90 or higher, on these SNPs and then run the match to see if any SNPs with different IDs actually belong to the same block of tightly (genetically) linked SNPs.

Statistics on association with disease could be garnered from the GWAS catalog at www.genome.gov or by mining OMIM.