Question: Whole exomes: comparative analysis
0
gravatar for bioinfo8
2.1 years ago by
bioinfo8120
bioinfo8120 wrote:

Hi,

I have looked around a lot to find how to analyse whole exomes. Literature indicates the usage of Samtools, Bedtools and GATK. But, I am unable to find any clear and detailed tutorial for how to proceed with exome BAM files.

I want to analyse paired-end BAM files which are the whole exomes already aligned with reference using BWA and duplicates marked (as @PG indicates ID:bammarkduplicates2). There are two groups each with 3 individuals, so I have 6 BAM files in total.

I have done some initial analysis using Qualimap and from the PCA, I could see the variations (polymorphism in the individuals) based on how they clustered.

However, I am interested to find out further:

1) the total number of genes in each and then average number of genes from all 6 files?

2) conserved / non-conserved regions in exomes with respect to reference

3) location for genes of interest on exomes with respect to reference (I have gene list)

4) Any other way for PCA and polymorphism information

I would appreciate any guidance for the above.

P.S.: I am a R admirer, so the R solutions would work best!

Thanks!

whole exome bam R exome • 922 views
ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by bioinfo8120
2

It's not obvious what your main objective is in this analysis. For sure looking at the number of variants isn't the final outcome?

3) location for genes of interest on exomes with respect to reference (I have gene list)

For this you wouldn't need exome sequencing... just the reference genome and a genome browser will do.

ADD REPLYlink written 2.1 years ago by WouterDeCoster38k

Thanks @WouterDeCoster!

The genus for the reference and generated exomes are similar but not species, so my interest is the comparative analysis between them [which will cover 1) and 2) ].

I have some genes (~150) specific for a feature in the reference and want to compare them within exomes (3).

As all exomes (BAM) are from same species but from different individuals, so I don't know whether I should analyse each of them separately or together.

I hope it is more clear now. :)

ADD REPLYlink written 2.1 years ago by bioinfo8120

So your main interest is to compare two species, of which one has a reference genome? So what is the biological question or hypothesis?

ADD REPLYlink written 2.1 years ago by WouterDeCoster38k

Yes and for whole exomes (paired-end BAM files aligned to reference) I have from many individuals of the same species (same genus as reference), I would like to find out:

1) How much similar and different these exomes are from the reference?

2) How many total number of genes they have and average number of genes?

3) I have a gene list (~150 genes) from reference responsible for a specific feature e.g. localization. I want to compare these genes to the genes from the exomes.

4) As exomes are from various individuals, variations among them would be worth to study.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by bioinfo8120
1

1) How much similar and different these exomes are from the reference?

So you would perform variant calling on those?

2) How many total number of genes they have and average number of genes?

You(or someone else) designed an assay for exome sequencing. Therefore you can only find the genes you targeted, so you will not learn new things about the total number of genes.

3) I have a gene list (~150 genes) from reference responsible for a specific feature e.g. localization. I want to compare these genes to the genes from the exomes.

So, variant calling?

4) As exomes are from various individuals, variations among them would be worth to study.

And variant calling.

ADD REPLYlink written 2.1 years ago by WouterDeCoster38k

2) Someone else did experiment but these are whole exomes.

I don't know whether I should analyse all exomes separately or together. Suggestions please.

Thanks!

ADD REPLYlink written 2.1 years ago by bioinfo8120
1

By the sounds of things, you want to do variant calling, so again, have you looked at the GATK best practises?

ADD REPLYlink written 2.1 years ago by andrew.j.skelton735.6k

whole exomes.

Exome sequencing? or genome sequencing?
Exome sequencing requires a priori knowledge of what is coding (to target with probes). So you only sequence what you target for.

ADD REPLYlink written 2.1 years ago by WouterDeCoster38k

Exome sequencing using all protein-coding information from reference.

I have to analyse whole exome BAM files already aligned with the reference and duplicates marked. I did some initial analysis using Qualimap, but not satisfied.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by bioinfo8120

@WouterDeCoster, it would be nice if you can give some thoughts!

Thanks

ADD REPLYlink written 2.1 years ago by bioinfo8120
4
gravatar for andrew.j.skelton73
2.1 years ago by
London
andrew.j.skelton735.6k wrote:

Have you looked at the GATK best practises? - With exome data you'd typically call variants and indels, and work with the resulting VCF(s), then interrogate the the calls based on the context of your experiments (singletons, families, causal variant search, etc). You can produce a PCA plot from the VCF using the SNPRelate package in R.

ADD COMMENTlink written 2.1 years ago by andrew.j.skelton735.6k

Thanks @andrew.j.skelton73 for 4) query!

ADD REPLYlink written 2.1 years ago by bioinfo8120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 852 users visited in the last hour