Question: Snp Cluster Analysis
gravatar for jackuser1979
7.2 years ago by
jackuser1979870 wrote:

I have illumina paired-end reads mapped to reference genome using bowtie and created mpileup using samtools and from mpileup identified SNPs using variant caller (varscan).I got the output in VCF format. I need to do SNP cluster analysis. Are there any software to do SNP cluster analysis or any R packages available?

bioinformatics R • 7.3k views
ADD COMMENTlink modified 6.9 years ago by brentp23k • written 7.2 years ago by jackuser1979870

What do you mean by clustering? There are many different types of analyses that uses clustering for NGS data...

ADD REPLYlink written 7.2 years ago by Zev.Kronenberg11k

Could even say, WHY do you need to do cluster analysis? Just because somebody told you?

ADD REPLYlink written 7.2 years ago by Michael Dondrup46k

just for my curosity..I want to try clustering analysis..any info or url to get started are welcome

ADD REPLYlink written 7.2 years ago by jackuser1979870
gravatar for Houkto
6.9 years ago by
Houkto210 wrote:

As the rest of answers I do not understand what do you mean by cluster analysis using SNPs. However, you can see the distribution and the frequency of SNPs over a certain window size across the genome. Doing that you can see if there are a cluster of SNPs in a certain region such as chromosome, to do that you can use a tool called CIRCOS LINK (it has a tutorial . Another clustering method of SNPs is by categorizing their predicted effect on a gene such as synonymous or non-synonymous, and stop coding variants using Ensembl tool called Variant Effect Predictor LINK . These the two clustering things I can think of. Another one in which you have to sequence more than one genome of the same species and you want to see if they are closely related or not.

Let me know which one you mean if they are not of the two examples in which I can be more helpful

ADD COMMENTlink written 6.9 years ago by Houkto210
gravatar for brentp
6.9 years ago by
Salt Lake City, UT
brentp23k wrote:

If you are asking about burden testing, AssoTesteR

You simply put your phenotype as 1 / 0 for case / control and a genotype matrix with 1 for the alternate and 0 for the reference in columns of snps and rows of samples.

Once your data is in that format, you can perform a variety of multi-locus tests including, for example, c-alpha

ADD COMMENTlink written 6.9 years ago by brentp23k

hmm, maybe a tool that reads the locations in a vcf-file and shows a chart of their density ?

f(x)=#locations between x/640L and (x+1)/640L , L=total length

looks useful, I want it too , trying to write it ...

ADD REPLYlink written 6.9 years ago by gs10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1202 users visited in the last hour