How to Find Exon - Intron Junctions for 1 Exon - Intron using UCSC Genome Browser
0
0
Entering edit mode
7.1 years ago
System ▴ 170

Hello everyone,

A lab member was able to use the TSS annotation for all genes to analysis factor enrichment in Pol II mark as well as other marks such as those found in the 7SK snRNP (KAP1, Hexim1). He was able to generate heatmaps and metagene graphs. I am now taking over the bioinformatics portion of a new similar project and have a couple of questions that I hope someone can help with.

1.) My project requires me to conduct similar analysis except that I will be looking at factor enrichment in Exon - Intron / Intron - Exon junctions. The problem is that my lab member does not know how he was able to get a hold of the TSSAnnotationForAllGenes.txt file he used to annotate peaks, all he knows is he was able to find a text file online. I have spent hours searching for Exon Start Site annotation text files but no to avail.

With that being said, I am assuming that this information should be available (if it exists) in the UCSC Genome Browser database. What I am unsure about is how to exactly identify and extract Exon - Intron Junction information using UCSC. I have attempted to use the Table Browser using the clade/genome/assembly for my model (Human h19). I've used Genes and Gene Predictions group, and RefSeq track and for Region I have specifically looked at the BRCA1 gene position and then output the format to BED so that I can annotate peaks, and generate my heatmaps and metagene plots.

EDIT: After clarification with my PI, I want to do a meta-gene of regions that are centered (x axis = 0) on exon-intron boundaries. The BRCA1 is just a test gene, as I want to make sure it works before going to my PI saying that it can be done.

Is this the correct way to go about doing this? My first heatmap generation looked like this, but at least to me it makes little sense.

I have attached the first heatmap I generated and maybe someone can shed some light on this.

RNA-Seq ChIP-Seq gene • 4.4k views
1
Entering edit mode

If I understand correctly, you want to do a "meta-gene" of regions centered on exon-intron boundaries, or intron-exon boundaries, and are using BRCA1 as your example? It seems like you would want to start with the BED file of Exons, and separate out exon ends (exon-intron boundaries) and starts (intron-exon boundaries), as two different lists, then expand out 500 bp in each direction, to generate new BED files. Then plot the signal of your factor over each region of interest.

0
Entering edit mode

Yes after clarification with my PI, I want to do a meta-gene of regions that are centered (x axis = 0) on exon-intron boundaries. The BRCA1 is just a test gene, as I want to make sure it works before going to my PI saying that it can be done.

The BED file should be "Exons" and not "Coding Exons" then? (When choosing output file selection in the UCSC browser)

I'm sorry if what I type is redundant at times, it's just a way for me to remember information. Thank you very much for your help so far!

EDIT: How exactly would I generate two seperate lists of exon starts and ends?

This meta-gene plot is exactly what I would like to do. The example is Figure 4 (shown) panel F.

1
Entering edit mode

Without additional information on your factor of interest, I would say yes, you would want all exons. I would extract all exons (starting with BRCA1, then all genes, or more precisely all expressed genes/splice variants in your tissue), and use the bed file to identify exon-intron and intron-exon boundaries (which will be the first or second coordinate, respectively, or vice-versa, depending on gene orientatation), and expand your window from that coordinate, using Excel, or (preferred by me) awk, or bedtools slop.

If you only want exon 1, then that makes life simpler.

0
Entering edit mode

This is great! Okay, so lets say I only want exon 1 to create a test metagene plot. I download the genes BED file and then use that file to identify exon 1 exon-intron and intron-exon. I think I've got a good grasp up to here, but how exactly do I expand my window from that coordinate using Excel for example?

Also, is there anyway to download the BED file for only Exon 1?

EDIT: I downloaded all genes and looked at first and second coordinate and then downloaded a new BED file from these coordinates. Is this BED file my Exon 1?

Again, thank you so much!

0
Entering edit mode

Okay, let's say Excel (I love Awk, and like to proselytize for it, but it's overkill for this).

In this case, separate genes with forward vs reverse oriented transcription.

For "forward" genes, sort by isoform, then by exon. Use "remove duplicates" to remove exons greater than 1, by isoform. (You should probably also remove any single-exon genes). Take the second coordinate (the end coordinate, which will be exon-intron), and in the adjacent column, subtract 1000, and in the column next to that add 1000. Double-check that neither of these numbers went less than 0, or past the end of the chromosome.

The first and second coordinate, for the first line of each isoform, will be exon 1 ONLY if it's transcribed as + strand, be careful of this!

0
Entering edit mode

So this is the metagene plot that I produced after a bit of tinkering. Please let me know if this seems to be headed on the right path, or if not so I can completely backtrack. Please note that the 0 position is centered on the TSS and I would like it to be centered on the exon-intron junction, though I'm still attempting to figure out how exactly to do that.

0
Entering edit mode

That seems like the right concept (I initially missed it, but then scrolled to the right). If you can consistently find the TSS (the start of the first exon, for forward-oriented genes, or the end of the last exon, for reverse-oriented genes), then you can find the first exon-intron juction, because it will just be the other coordinate of that exon line in the bed file. It looks like you were able to expand from each coordinate +/- 5kb, so you're on the right track!