Dendrogram Of Snps
4
2
Entering edit mode
13.7 years ago
Pavid ▴ 160

Hi guys!

I'm starting in this field of bioinformatics, and I'm quite enjoying. But my knowledge of biology is very small.

I have 700samples genotyped for 90 SNPs and I would like to build a dendrogram so I could divide my data into some clusters.
But most of the programs have a limit of 500 samples. Are you aware of some programs with larger limitations?

Ideas or improvements are well come :)

Thanks for any help

snp clustering • 4.6k views
ADD COMMENT
2
Entering edit mode

what is your Dendrogram about ? What are the softwares you tested ?

ADD REPLY
0
Entering edit mode

ClustalW has limit, like Phylip and Tcoffee. I've read some papers and I'm thinking in trying MEGA, that is quite used.. The dendrogram will shown the distances between the samples, in that way I could also see some clusters

ADD REPLY
0
Entering edit mode

Would you like to split your question in to two separate questions for better visibility and better answers ?

ADD REPLY
0
Entering edit mode

Ok, I will make two questions

ADD REPLY
0
Entering edit mode

Sorry - the new question got flagged as a duplicate and I merged the two, only now seeing your comment here.

ADD REPLY
0
Entering edit mode

Lars, I suggested splitting up of the questions to Patricia, because I thought the context of dendrogram of SNP and the strains are different and get better attention if posted as separate questions.

ADD REPLY
2
Entering edit mode
13.7 years ago

if you are able to format your data into phylip format, I have found that the viewer called Archaeopteyx (the ATV successor, which is based on the forester library) is more than capable of dealing with hundreds and thousands of samples. I haven't tested if further, but the developers claim that it is the most powerfull approach for phylogenetic representation, and although I'm not a phylogenetic expert I have tested it along with a few others and no one has performed as well as this one.

here are the references for further reading:

  • Han M.V. and Zmasek C.M. (2009). phyloXML: XML for evolutionary biology and comparative genomics. BMC Bioinformatics, 10:356.
  • Zmasek C.M. and Eddy S.R. (2001) ATV: display and manipulation of annotated phylogenetic trees. Bioinformatics, 17, 383-384.
ADD COMMENT
0
Entering edit mode

Thanks, I have to try then :)

ADD REPLY
2
Entering edit mode
13.7 years ago

It is not clear from the question if the SNPs reside on a contiguous sequence - in which case one could try Clustal W et al. - or are spread throughout a genome. This question needs revision...

ADD COMMENT
0
Entering edit mode

I've tried ClustalW and it doesn't work, because it was a limit of 500sequences. I only have some SNPs and that's what we are analyzing, that is why only those SNPs are aligned. It's a smaller sequence

ADD REPLY
1
Entering edit mode
13.7 years ago

First off, you'll have to specify what species you mean.

Assuming you mean strain of mouse, your first and best bet is to know this ahead of time. You should be aware that model organisms can be inbred (meaning two or more of the same strain are expected to have identical or near-identical genomes, depending on the degree of inbreeding) or of a mixed background. Many experiments are done with mixed-background animals, particularly those where a mutation on one strain is being crossed onto another strain with some useful feature (e.g. it will activate the mutation in a particular organ). If you are sure the strains are inbred, but you don't know what strains they are, you are still in a near-hopeless situation. If you know the samples are one of X strains, where X is some suitably small number such as two, you may have a shot. Various SNPs are informative for strain differences, and two places to get started are the Jackson Labs database for SNP variation (http://www.informatics.jax.org/javawi2/servlet/WIFetch?page=snpQF) and the Sanger center mouse genome project (http://www.sanger.ac.uk/Projects/M_musculus/).

ADD COMMENT
0
Entering edit mode

I entered this response to answer a question that was titled "Reference strains, how to identify strains?". That question has disappeared, and it got appended to this question.

ADD REPLY
0
Entering edit mode

Yes, might have been a mistake of me to merge the two - not quite sure. It was clearly two very, very similar and closely related questions by the same person.

ADD REPLY
1
Entering edit mode
13.7 years ago

It is not quite clear from your question, but I assume that what you are talking about is genotyping of a particular species of bacteria. Since they have been genotyped for 90 SNPs, my guess would be that these SNPs were not picked at random. Most likely, it is a set of SNPs that is commonly used for distinguishing between different strains of the bacterium in question.

My guess is thus that would you should really look for is a genotyping database of strains of the particular species that you are working on. Most likely many, many more different strains have been genotyped than have been fully sequenced. If you find such a database, the analysis that you talk about would be a simple matter of comparing the SNPs from your samples to the reference samples in the database.

Unless you tell us which species it is you are working on, I don't think we will be able to help you much further.

ADD COMMENT
0
Entering edit mode

sorry, I didn't say it.. But yes, you're right. My reference is Mycobacterium H37Rv, and the SNPs were choosen for differents reasons, drug resistance, etc.. There is some strains fully sequenced, like Bovis, F11.. Since the size of genomes changes, one position in H37Rv doesn't mean is the same in Bovis, for instance.

Thank you for your help

ADD REPLY

Login before adding your answer.

Traffic: 2475 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6