Question: Dendrogram Of Snps
2
gravatar for Pavid
9.3 years ago by
Pavid160
Pavid160 wrote:

Hi guys!

I'm starting in this field of bioinformatics, and I'm quite enjoying. But my knowledge of biology is very small.

I have 700samples genotyped for 90 SNPs and I would like to build a dendrogram so I could divide my data into some clusters.
But most of the programs have a limit of 500 samples. Are you aware of some programs with larger limitations?

Ideas or improvements are well come :)

Thanks for any help

snp clustering • 2.6k views
ADD COMMENTlink modified 8.3 years ago by Lars Juhl Jensen11k • written 9.3 years ago by Pavid160
2

what is your Dendrogram about ? What are the softwares you tested ?

ADD REPLYlink written 9.3 years ago by Pierre Lindenbaum124k

ClustalW has limit, like Phylip and Tcoffee. I've read some papers and I'm thinking in trying MEGA, that is quite used.. The dendrogram will shown the distances between the samples, in that way I could also see some clusters

ADD REPLYlink written 9.3 years ago by Pavid160

Would you like to split your question in to two separate questions for better visibility and better answers ?

ADD REPLYlink written 9.3 years ago by Khader Shameer18k

Ok, I will make two questions

ADD REPLYlink written 9.3 years ago by Pavid160

Sorry - the new question got flagged as a duplicate and I merged the two, only now seeing your comment here.

ADD REPLYlink written 9.3 years ago by Lars Juhl Jensen11k

Lars, I suggested splitting up of the questions to Patricia, because I thought the context of dendrogram of SNP and the strains are different and get better attention if posted as separate questions.

ADD REPLYlink written 9.3 years ago by Khader Shameer18k
2
gravatar for Jorge Amigo
9.3 years ago by
Jorge Amigo11k
Santiago de Compostela, Spain
Jorge Amigo11k wrote:

if you are able to format your data into phylip format, I have found that the viewer called Archaeopteyx (the ATV successor, which is based on the forester library) is more than capable of dealing with hundreds and thousands of samples. I haven't tested if further, but the developers claim that it is the most powerfull approach for phylogenetic representation, and although I'm not a phylogenetic expert I have tested it along with a few others and no one has performed as well as this one.

here are the references for further reading:

  • Han M.V. and Zmasek C.M. (2009). phyloXML: XML for evolutionary biology and comparative genomics. BMC Bioinformatics, 10:356.
  • Zmasek C.M. and Eddy S.R. (2001) ATV: display and manipulation of annotated phylogenetic trees. Bioinformatics, 17, 383-384.
ADD COMMENTlink written 9.3 years ago by Jorge Amigo11k

Thanks, I have to try then :)

ADD REPLYlink written 9.3 years ago by Pavid160
2
gravatar for Larry_Parnell
9.3 years ago by
Larry_Parnell16k
Boston, MA USA
Larry_Parnell16k wrote:

It is not clear from the question if the SNPs reside on a contiguous sequence - in which case one could try Clustal W et al. - or are spread throughout a genome. This question needs revision...

ADD COMMENTlink written 9.3 years ago by Larry_Parnell16k

I've tried ClustalW and it doesn't work, because it was a limit of 500sequences. I only have some SNPs and that's what we are analyzing, that is why only those SNPs are aligned. It's a smaller sequence

ADD REPLYlink written 9.3 years ago by Pavid160
1
gravatar for David Quigley
9.3 years ago by
David Quigley11k
San Francisco
David Quigley11k wrote:

First off, you'll have to specify what species you mean.

Assuming you mean strain of mouse, your first and best bet is to know this ahead of time. You should be aware that model organisms can be inbred (meaning two or more of the same strain are expected to have identical or near-identical genomes, depending on the degree of inbreeding) or of a mixed background. Many experiments are done with mixed-background animals, particularly those where a mutation on one strain is being crossed onto another strain with some useful feature (e.g. it will activate the mutation in a particular organ). If you are sure the strains are inbred, but you don't know what strains they are, you are still in a near-hopeless situation. If you know the samples are one of X strains, where X is some suitably small number such as two, you may have a shot. Various SNPs are informative for strain differences, and two places to get started are the Jackson Labs database for SNP variation (http://www.informatics.jax.org/javawi2/servlet/WIFetch?page=snpQF) and the Sanger center mouse genome project (http://www.sanger.ac.uk/Projects/M_musculus/).

ADD COMMENTlink written 9.3 years ago by David Quigley11k

I entered this response to answer a question that was titled "Reference strains, how to identify strains?". That question has disappeared, and it got appended to this question.

ADD REPLYlink written 9.3 years ago by David Quigley11k

Yes, might have been a mistake of me to merge the two - not quite sure. It was clearly two very, very similar and closely related questions by the same person.

ADD REPLYlink written 9.3 years ago by Lars Juhl Jensen11k
1
gravatar for Lars Juhl Jensen
9.3 years ago by
Copenhagen, Denmark
Lars Juhl Jensen11k wrote:

It is not quite clear from your question, but I assume that what you are talking about is genotyping of a particular species of bacteria. Since they have been genotyped for 90 SNPs, my guess would be that these SNPs were not picked at random. Most likely, it is a set of SNPs that is commonly used for distinguishing between different strains of the bacterium in question.

My guess is thus that would you should really look for is a genotyping database of strains of the particular species that you are working on. Most likely many, many more different strains have been genotyped than have been fully sequenced. If you find such a database, the analysis that you talk about would be a simple matter of comparing the SNPs from your samples to the reference samples in the database.

Unless you tell us which species it is you are working on, I don't think we will be able to help you much further.

ADD COMMENTlink written 9.3 years ago by Lars Juhl Jensen11k

sorry, I didn't say it.. But yes, you're right. My reference is Mycobacterium H37Rv, and the SNPs were choosen for differents reasons, drug resistance, etc.. There is some strains fully sequenced, like Bovis, F11.. Since the size of genomes changes, one position in H37Rv doesn't mean is the same in Bovis, for instance.

Thank you for your help

ADD REPLYlink written 9.3 years ago by Pavid160
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 805 users visited in the last hour