Question: Snps In Promoter Regions In Exome Sequencing?
4
gravatar for Kevin
8.5 years ago by
Kevin630
Kevin630 wrote:

I have on hand exome sequencing data. One question was posed to me in a biological standpoint that I can't really answer.

1) for the SNPs that are not in the coding regions but are in promoter regions can I analyze them as per WGS? 2) is there a good way to get a bed file of the promoter regions? Should I be looking at the exome target bed file instead for targeted regions that lie outside the exons? Would I be doing anything grossly wrong if I were to analyze reads that lie outside of the targeted region if there's good coverage?

part of the answer: http://biostar.stackexchange.com/questions/888/how-to-get-promoter-sequences-for-human-genes

http://genome.ucsc.edu/ENCODE/downloads.html

If you define X kb upstream of a gene to be a promoter, you can get this using the UCSC table browser as follows: http://biostar.stackexchange.com/questions/8230/hg19-promoters-bed-file

ADD COMMENTlink modified 2.4 years ago by Biostar ♦♦ 20 • written 8.5 years ago by Kevin630
4
gravatar for Nina
8.4 years ago by
Nina340
Vancouver, BC, Canada
Nina340 wrote:

I would argue that no, you cannot (reliably) analyze anything that aligns outside the targeted regions in exon capture data. If you see a dense region of coverage outside a targeted region, one of two things has happened: a) the reads really came from there and they were capture due to off-target binding b) the reads really came from somewhere else and were misaligned.

The makers of the exon cap kit went through great care to reduce the chance of off-target binding, so I would expect that almost everything you see will be the result of a misalignment. Because the aligner is just following a set of rules, you will often get large collections of reads that have all been systematically aligned to the wrong place. Furthermore, due to differences between the human genome reference and the actual genome of the individual that was sequenced, it can be impossible to tell if a read was properly aligned or not.

That being said, I have noticed that the coverage often spills out ~100bp on either side of the bounds of the officially targeted region, so if your promoter of interest is super close to a targeted region, you may be in luck.

btw In case you are not convinced that you can't tell good alignments from bad ones, consider the following case. The aligner says "this read maps perfectly to exactly one location". In fact the gene it aligned to has a paralog which differs by one base. Furthermore, the aligner can't know this, but in reality the individual you sequenced does not have the snp that distinguishes the two paralogs in the canonical reference. So in the end it should have really mapped ambiguously because it aligns to two locations equally well.

ADD COMMENTlink written 8.4 years ago by Nina340
2
gravatar for Ryan Thompson
8.5 years ago by
Ryan Thompson3.4k
TSRI, La Jolla, CA
Ryan Thompson3.4k wrote:

I just treat exome capture & seq data as WGS data with really uneven coverage distribution. As long as you are sufficiently confident that your reads are mapped correctly and you have sufficient coverage in your region of interest, you can draw draw all the same inferences as from WGS data.

ADD COMMENTlink written 8.5 years ago by Ryan Thompson3.4k
1
gravatar for Larry_Parnell
8.4 years ago by
Larry_Parnell16k
Boston, MA USA
Larry_Parnell16k wrote:

Don't forget about alternate splicing in which one mRNA's exon is part of the promoter region of the alt. spliced mRNA. There are not so many cases of this but there are enough that it justifies keeping this in mind. Consider a 10-exon mRNA with an alternate version where transcription is fired from a promoter in intron 5 giving a transcript consisting of exons 6 through 10. The exome data for exons 1 through 5 then are pieces of the promoter for the shorter transcript.

ADD COMMENTlink written 8.4 years ago by Larry_Parnell16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 772 users visited in the last hour