Question

SNP data analysis - advices for statistics

1

Entering edit mode

8.4 years ago

dovah ▴ 40

Hi,

I hope this is the right forum to ask you for some advice on how to analyse SNP data.

I have analysed the SNPs in different strains of the same species (i have 30 of them). I have recorded the SNP number count for all these strains, in four different genomic regions (coding, non-coding...etc). The data are stored in a file with three columns: $1-SNPcount; $2-strain; $3region. "SNP" is thus a numerical variable, while "strain" and "region" are factorial.

How would you advise to statistically analyse those data? I plotted the %of SNP in each region per strain, but this wouldn't obviously take into account the richness in SNP of each strain. I might think of doing a glm(SNP~strain+region), but obviously the results of the model would definitely depend on which variable level you choose as "reference".

I am grateful for any constructive advice :)

snp genome R • 2.1k views

ADD COMMENT • link updated 21 months ago by Ram 43k • written 8.4 years ago by dovah ▴ 40

0

Entering edit mode

I don't see what is the question you want to answered. Do you want to know if some strains/regions have more SNPs than others?

ADD REPLY • link 8.4 years ago by abascalfederico ★ 1.2k

0

Entering edit mode

yes, exactly.

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.4 years ago by dovah ▴ 40

0

Entering edit mode

I would suggest you to first control for coverage, to see if this may bias your results. Then, you could compare the proportions of coding and non-coding SNPs (or any other "category") between strains using a Fisher's exact test.

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.4 years ago by abascalfederico ★ 1.2k

Ram · Answer 1 · 2016-02-23

1

Entering edit mode

8.2 years ago

reza.jabal ▴ 580

Hi,

I believe you should first investigate population structure to detect potential outliers (before committing yourself to any further analysis) by doing PCA (Principal Component Analysis). You may find this from "Cross Validated" forum useful!

ADD COMMENT • link updated 21 months ago by Ram 43k • written 8.2 years ago by reza.jabal ▴ 580