Question

Dendrogram Of Snps

2

Entering edit mode

13.7 years ago

Pavid ▴ 160

Hi guys!

I'm starting in this field of bioinformatics, and I'm quite enjoying. But my knowledge of biology is very small.

I have 700samples genotyped for 90 SNPs and I would like to build a dendrogram so I could divide my data into some clusters.
But most of the programs have a limit of 500 samples. Are you aware of some programs with larger limitations?

Ideas or improvements are well come :)

Thanks for any help

snp clustering • 4.6k views

ADD COMMENT • link updated 12.7 years ago by Lars Juhl Jensen 11k • written 13.7 years ago by Pavid ▴ 160

2

Entering edit mode

what is your Dendrogram about ? What are the softwares you tested ?

ADD REPLY • link 13.7 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

ClustalW has limit, like Phylip and Tcoffee. I've read some papers and I'm thinking in trying MEGA, that is quite used.. The dendrogram will shown the distances between the samples, in that way I could also see some clusters

ADD REPLY • link 13.7 years ago by Pavid ▴ 160

0

Entering edit mode

Would you like to split your question in to two separate questions for better visibility and better answers ?

ADD REPLY • link 13.7 years ago by Khader Shameer 18k

0

Entering edit mode

Ok, I will make two questions

ADD REPLY • link 13.7 years ago by Pavid ▴ 160

0

Entering edit mode

Sorry - the new question got flagged as a duplicate and I merged the two, only now seeing your comment here.

ADD REPLY • link 13.7 years ago by Lars Juhl Jensen 11k

0

Entering edit mode

Lars, I suggested splitting up of the questions to Patricia, because I thought the context of dendrogram of SNP and the strains are different and get better attention if posted as separate questions.

ADD REPLY • link 13.7 years ago by Khader Shameer 18k

score 2 · Answer 1 · 2010-09-08

if you are able to format your data into phylip format, I have found that the viewer called Archaeopteyx (the ATV successor, which is based on the forester library) is more than capable of dealing with hundreds and thousands of samples. I haven't tested if further, but the developers claim that it is the most powerfull approach for phylogenetic representation, and although I'm not a phylogenetic expert I have tested it along with a few others and no one has performed as well as this one.

here are the references for further reading:

Han M.V. and Zmasek C.M. (2009). phyloXML: XML for evolutionary biology and comparative genomics. BMC Bioinformatics, 10:356.
Zmasek C.M. and Eddy S.R. (2001) ATV: display and manipulation of annotated phylogenetic trees. Bioinformatics, 17, 383-384.

score 2 · Answer 2 · 2010-09-08

2

Entering edit mode

13.7 years ago

Larry_Parnell 16k

It is not clear from the question if the SNPs reside on a contiguous sequence - in which case one could try Clustal W et al. - or are spread throughout a genome. This question needs revision...

ADD COMMENT • link 13.7 years ago by Larry_Parnell 16k

0

Entering edit mode

I've tried ClustalW and it doesn't work, because it was a limit of 500sequences. I only have some SNPs and that's what we are analyzing, that is why only those SNPs are aligned. It's a smaller sequence

ADD REPLY • link 13.7 years ago by Pavid ▴ 160

score 1 · Answer 3 · 2010-09-08

First off, you'll have to specify what species you mean.

Assuming you mean strain of mouse, your first and best bet is to know this ahead of time. You should be aware that model organisms can be inbred (meaning two or more of the same strain are expected to have identical or near-identical genomes, depending on the degree of inbreeding) or of a mixed background. Many experiments are done with mixed-background animals, particularly those where a mutation on one strain is being crossed onto another strain with some useful feature (e.g. it will activate the mutation in a particular organ). If you are sure the strains are inbred, but you don't know what strains they are, you are still in a near-hopeless situation. If you know the samples are one of X strains, where X is some suitably small number such as two, you may have a shot. Various SNPs are informative for strain differences, and two places to get started are the Jackson Labs database for SNP variation (http://www.informatics.jax.org/javawi2/servlet/WIFetch?page=snpQF) and the Sanger center mouse genome project (http://www.sanger.ac.uk/Projects/M_musculus/).

score 1 · Answer 4 · 2010-09-08

It is not quite clear from your question, but I assume that what you are talking about is genotyping of a particular species of bacteria. Since they have been genotyped for 90 SNPs, my guess would be that these SNPs were not picked at random. Most likely, it is a set of SNPs that is commonly used for distinguishing between different strains of the bacterium in question.

My guess is thus that would you should really look for is a genotyping database of strains of the particular species that you are working on. Most likely many, many more different strains have been genotyped than have been fully sequenced. If you find such a database, the analysis that you talk about would be a simple matter of comparing the SNPs from your samples to the reference samples in the database.

Unless you tell us which species it is you are working on, I don't think we will be able to help you much further.