Question: How To Analyse Snp Data From Different Sources?
gravatar for John Bodovsky
9.9 years ago by
John Bodovsky30 wrote:

I have 3 tables with SNPs and corresponding diseases. How can I get some statistics on them: which are in all lists, which are unique, and if possible make Euler's circles graph. Additionally, it would be good to find some tool for making graphs like this:

alt text

comparison visualization snp • 2.5k views
ADD COMMENTlink modified 7.0 years ago by Biostar ♦♦ 20 • written 9.9 years ago by John Bodovsky30

aside from the graphical representation this looks to me like a very simple matching script. what is exactly keeping you from coding it? a better description of the input may probably help R experts to suggest great visualization scripts for your data.

ADD REPLYlink written 9.9 years ago by Jorge Amigo12k
gravatar for Rlong
9.3 years ago by
Rlong340 wrote:

In order to find which SNPs are common to all groups, you could use a tool like bedtools or joinx to intersect the lists, assuming you have them in BED or VCF files. This also assumes that your SNP lists are all on the same reference.

In order to find the common SNPs between all of the lists try:

joinx intersect -a list1.bed -b list2.bed -o common_to_lists_1_2.bed 
joinx intersect -a list3.bed -b common_to_lists_1_2.bed -o intersection.bed

The resulting intersection.bed gives you those SNPs common to list1,list2, and list3. Then if you want to find the total number of recurring SNPs, run one more set of intersections:

joinx intersect -a list1.bed -b list3.bed -o common_to_lists_1_3.bed
joinx intersect -a list2.bed -b list3.bed -o common_to_lists_2_3.bed

Now find the difference between the 1-2, 1-3, and 2-3 lists and the 1-2-3 list:

joinx intersect -a common_to_lists_1_2.bed -b intersection.bed --miss-a lists_1_2.bed
joinx intersect -a common_to_lists_1_3.bed -b intersection.bed --miss-a lists_1_3.bed
joinx intersect -a common_to_lists_2_3.bed -b intersection.bed --miss-a lists_2_3.bed

and your lists__.bed will contain all those SNPs which are shared by 2 lists. To find the total number of SNPs found on more than one list:

wc -l lists_?_?.bed intersection.bed
ADD COMMENTlink modified 9.3 years ago • written 9.3 years ago by Rlong340
gravatar for Larry_Parnell
9.3 years ago by
Boston, MA USA
Larry_Parnell16k wrote:

Jorge offers a good point. On the other hand, LD (linkage disequilibrium) could be used to "match" different SNPs that tag each other. So, before you perform the simple match with a script, you should collect LD data, say at r^2 = 0.90 or higher, on these SNPs and then run the match to see if any SNPs with different IDs actually belong to the same block of tightly (genetically) linked SNPs.

Statistics on association with disease could be garnered from the GWAS catalog at or by mining OMIM.

ADD COMMENTlink written 9.3 years ago by Larry_Parnell16k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1134 users visited in the last hour