Analyse Population Genomics Data With Different Coverage
Entering edit mode
12.0 years ago
Plantae ▴ 390

Hi, all

We have sequenced multiple individual for one species with illumina platform

the sequencing depth for our data: 7 individuals: 60X reads 32 individuals: 2~10X reads

I have called SNPs for all these individuals, now I want to use these SNP data to do further analysis, eg, population structure, LD, FST, etc.

I got strange results when using all individuals in population structure analyses-- individuals with high coverage were clustered together, although they beloning to different sub-populations (some of them are cultivates, ohters wild). And all other individuals (low coverage) were clustered together.

I have checked SNP result, and found high coverage individuals cotain much more SNPs than low coverage individuals.

So, should I exclude all these high cov individuals for further analysis?

next-gen population coverage • 2.8k views
Entering edit mode

Unfortunately, aside from excluding higher cov. individuals, I would tried downsampling all individuals to your lowest coverage, and then try the analysis.

Entering edit mode
12.0 years ago
tiagoantao ▴ 690

For many analysis you do not need all the markers (structure/admixture comes to mind). Indeed you might have to remove markers in LD for some analysis. For these analysis the alternative that you have is use markers that overlap all your sets. So you can exclude the markers that only exist on high cov individuals and use all individuals.

I have used this approach with genotyped data and it worked like a charm (i.e. populations that are closer clustered together, irrespective of original number of markers).

Of course for analysis where marker density is to be maximised, other stategies need to be considered.


Login before adding your answer.

Traffic: 1354 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6