Question

Using GEO datasets from NCBI for EHH (extended haplotype homozygosity) test

0

Entering edit mode

8.1 years ago

kirannbishwa01 ★ 1.6k

I am not quite sure if BioStars is a right platform for this question but I am hoping there would be some good suggestions available.

Details: I have high coverage whole genome resequenced data for my model orranism for two different populations, while the sample size is low (only 6 individuals per population). I want to do EHH (extended haplotype homozygosity), rEHH and several other population genetics analyses but I think my sample size is low.

So, I was thinking of supplementing the polymorphism data from other source. There are several submitted sequence data for this model organism available on NCBI as GEO DataSet but almost all of them them are RNAseq and ChipSeq data. I was thinking if I could use this data to call variants and supplement my population genetics analyses - but I think the problem is that it would only pull any variant from targeted regions of the genome, which could introduce bias. But, I also think there could equally exist reads in this RNAseq/ChipSeq data which may not be enriched (for their target) but they could be distributed less randomly but enough to do variant calling (any suggestions??). I wanted to check this but want to get some suggestion before I download all the data, align it and check for distribution it which would take several weeks.

Thanks a lot in advance !

RNA-Seq ChIP-Seq sequence selection • 1.8k views

ADD COMMENT • link updated 7.9 years ago by Biostar 20 • written 8.1 years ago by kirannbishwa01 ★ 1.6k