Question: How to Design Exome Sequencing study?
0
gravatar for waqas.ahmad5231
3.8 years ago by
Germany
waqas.ahmad52310 wrote:

I am planning to analyse exome sequencing data. I am expecting data from the platform Illumina Infinium Human Exome-12 BeadChip

I am basically statistician and have an interest in bioinformatics. I am designing a case-control study, where 100 cases of diabetes and 100 as control (no diabetes).

I have following questions;

  1. Would 100 cases and 100 controls be sufficient to identify the variants? 
  2. Which software would be useful for analysing this type of data or any pipeline for Linux

I would really appreciate if you can help in this regards. Does this study make sense for the given data? 

Is there any good reference for this type of analysis?

Sorry for many questions.

UPDATE: Recently, I found that the data is exome sequencing (mean 40x, agilent v4, HiSeq). What would suggest/comment on my above questions about sample size and software? 

 

 

 

sequencing genome • 1.5k views
ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by waqas.ahmad52310
1

That's not sequencing data, that's array-based genotyping data. Your sample size is really, really small. I suggest you get acquainted with GWAS literature in general (google it), and then specifically for diabetes (type 1, type 2?) .

ADD REPLYlink written 3.8 years ago by Lemire390
1

Adding to the above comment, you might want to get acquainted with PLINK to analyse this data. 

ADD REPLYlink written 3.8 years ago by andrew.j.skelton735.8k

Thanks for your comments. More specifically I am interested in type 2 diabetes. I know the PLINK software, but I have no idea how this array-based genotyping data look like? Is it compatible with PLINK format?

ADD REPLYlink written 3.8 years ago by waqas.ahmad52310

The company will provide you with IDAT files, or hopefully a plink friendly format for you to deal with (I'd explicitly ask for it if I were you) - fam, bed, and bim files. Generally you'd use Genome Studio's genotyping module to perform genotyping from the chips, and get it to spit out a format you can work with. 

ADD REPLYlink written 3.8 years ago by andrew.j.skelton735.8k

Today I know from them that it is Exome sequencing (mean 40x, agilent v4, HiSeq). They will give me *.bam and *.vcf files. What would you say that in this situation the sample size of 100 cases and 100 control would be sufficient? Which software would be useful for this purpose?

ADD REPLYlink written 3.8 years ago by waqas.ahmad52310

It is still a relatively small sample size but then you can still give it a try. Just follow the GATK best practice for the variant calling and then perform the statistic analysis using something like the RVTest and SKAT

ADD REPLYlink written 3.8 years ago by Sam2.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 780 users visited in the last hour