Question: How do I identify and differentiate between unidirectional and bidirectional promoters
gravatar for cbio
4.6 years ago by
cbio450 wrote:

I have a set of genes that contain a protein of interest at the TSS. I would like to be able to separate these genes into two classes: genes with a unidirectional promoter, and genes with a bidirectional promoter.

I have access to pair-end GRO-Seq data, but no RNA-seq data. Is there a way to do this?

ADD COMMENTlink modified 4.6 years ago by ivivek_ngs5.0k • written 4.6 years ago by cbio450

technically, do you wan to get the 5' reads that go in opposite directions but overlap with each other ( or present with in certain distance, lets say 400bp ?) Like that of enhancerRNAs which transcribe bi directionally ?

ADD REPLYlink written 4.6 years ago by geek_y11k

Yes this is what I'd like to do. I had previously thought I could simply look for overlapping regions of gro-seq neg/pos coverage bedgraphs 1k from annotated TSS's using bedtools, but this did not work.

ADD REPLYlink written 4.6 years ago by cbio450

Do you have a separate files for 5' reads ? When you say paired end data, do you know which reads are originated from 5' of a transcript ?

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by geek_y11k

I do not have a separate file for these. What I have currently is a bedtools genomecoverage bedgraph that contains the entire coverage, and is not limited to the -5' option that I generated using:

genomeCoverageBed -bg -strand + -ibam $infile -g $genome > outdir/genomecoveragebed/$outfile3 

genomeCoverageBed -bg -strand - -ibam $infile -g $genome | awk -F '\t' -v OFS='\t' '{ $4 = - $4 ; print $0 }'> $outdir/genomecoveragebed/$outfile4

I'm very new to this GRO-Seq, and the data wasn't generated by my lab so getting information about it's generation has been difficult at best.

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by cbio450

If you have paired-End data, somehow you need to separate reads that originated from 5' end. Otherwise you will not be able to find out exactly bidirectional transcripts. Anyway, if you would like to check which of the regions from forward strand are close to regions on reverse strand, you could use the closestBed feature.

closestBed -a Fw_strand.bed -b Rv_strand.bed -d | awk -v OFS="\t" '{ if ($NF<=400) print $1, $2, $3}' | sort -k1,1 -k2,2n | uniq | wc -l

But this won't be exclusive to bidirectional transcripts. Infact, it does not meaningful at all as, in general, paired-end reads maps in fr or rf orientation , so you will definitely end up with may regions that are close to each other on Fw and reverse strand.

Ask the people who generated the data, if they can tell you how to separate reads originated from 5' ends. Then I can tell you how to get bidirectional transcripts.

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by geek_y11k
gravatar for ivivek_ngs
4.6 years ago by
Seattle,WA, USA
ivivek_ngs5.0k wrote:

I believe when you extract the list of genes from your data you have the strand specificity right? so then you will be able to understand which genes correspond to which strand be it + or - thus giving you strand specific feature. Then you can grep your output based on strand features.

This will give you two lists of promoters that have either + or - strandedness. Once you have it when you can overlap the genes to see bidirectional genes , since those which will overlap at refeseqIDs or gene symbols should be shared at the level of both strands. I believe this will help.

ADD COMMENTlink modified 2.0 years ago by RamRS30k • written 4.6 years ago by ivivek_ngs5.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1349 users visited in the last hour