Question: Define If Chip Peaks Are Located In Exon, Intron, Promoter, 3Prime Utr
4
gravatar for e.karasmani
8.1 years ago by
e.karasmani120
e.karasmani120 wrote:

Dear All,

I have file which looks like the following

 chromosome  start     end        peak.location     chip.value  target.gene.name   distance.to.gene
  chr1       162990333 162990703     162990519      33    RP11-331H2.3.1               136

Is there a way to define if this peak is in exon, intron, promoter or 3' UTR?

Do you know a way for that? Could you please give some quidelines?

thank you in advance

best regards Lena

chip-seq exon peak-calling intron • 7.4k views
ADD COMMENTlink modified 8.1 years ago by Steve Lianoglou5.0k • written 8.1 years ago by e.karasmani120

Do you have a gene model file (GFF) for the species/organism you're working on? Then, its straightforward to extract the corresponding intervals for your peak from the GFF and obtain the type.

ADD REPLYlink written 8.1 years ago by Arun2.3k

no i don't have anything....how can i do that?

moreover isn't any other way to fix my problem? Any package in R?

ADD REPLYlink written 8.1 years ago by e.karasmani120

GFF files are usually available from the same website where you download your reference. What are you working on? You can read more about GFF format here: http://www.sanger.ac.uk/resources/software/gff/spec.html

I haven't worked on ChIP data. So, I can't tell if the format is supposed to have the type annotated. Pablo's solution might be straightforward if you aren't working on plants I suppose.

ADD REPLYlink written 8.1 years ago by Arun2.3k
5
gravatar for David Langenberger
8.1 years ago by
Deutschland
David Langenberger9.5k wrote:

If your species is in the UCSC Browser, you can download bed Files of your regions (Promoters (x nts upstream), introns, exons, UTRs) and then use BEDtools intersectBed to annotate you ChiPseq peaks.

ADD COMMENTlink written 8.1 years ago by David Langenberger9.5k

my species are mm9 and hg18....is there an option in UCSC where you can download the regions (promoter, intron,exon) for ALL the genome????? because I am looking genome wide and not in a specific area....if this is possible could you please be kind to give me some guidelines???? I am a rookie in bioinformatics....thanks

ADD REPLYlink written 8.1 years ago by e.karasmani120
2

Go to 'Tables' within the UCSC Genome Browser. Select your species and the correct assembly. Under 'Group' you check 'Genes and ...' and under track e.g. RefSeq. Then you have to set 'output format' to 'bed'.

Now, when you push 'get output', you come to a new window, where you select what regions you want to have. For the Promoter, you can set 'upstream' to 5000, or whatever value you want. When you set a name for the output file in the first window, it will automatically download the data, named by your filename. I also recommend to set the 'gzip' flag, to minimize the file size for download.

ADD REPLYlink written 8.1 years ago by David Langenberger9.5k

thank you very much!!!!!!!!!! However then how can I compare my list to the lists that you say in order to identify where my peaks are???? What should I do?

ADD REPLYlink modified 8.1 years ago • written 8.1 years ago by e.karasmani120

If you overlap your list of peaks with your list of introns, you get back all peaks lying within introns.

With something like 'intersectBed -a peaks.bed -b introns.bed -wa -wb', you can see what introns that would be.

ADD REPLYlink written 8.1 years ago by David Langenberger9.5k

Hi, I would like to ask you something....how can I get the promoter or TSS from the UCSC tables method you describe??? there is no option for promoters. it has the follwing options only

Whole Gene
Exons

Introns

5' UTR Exons
Coding Exons
3' UTR Exons

what is the difference between coding exons and exons? Exons

ADD REPLYlink modified 8.1 years ago • written 8.1 years ago by e.karasmani120

Since there is no exact definition where a promoter starts, I recommend to use something like 5000 bases upstream of each gene (Upstream by XXX bases)... but feel free to take any other number, you like to! The TSS is the always the 5' end of your gene (gene on positive strand -> 2nd col in BED, gene on negative strand -> 3rd col in BED). The coding exons contain the region of all exons within the coding region (CDS). Since UTRs can be spliced (untranslated region =/= coding sequence), UCSC distinguished between the complete exons (Exons) and 5'-, coding- and 3'-exons. Looks like this: 1---1/....../2--=====2/.........../3=============3/......./4======---4/..../5-------5 (=CDS, -UTR, .intron). When you now look at the second exon, there are parts from the 5' UTR within the exon, that would be the Exon selection, when you select coding exon, you would get the '=' part and for 5' UTR you would get the '-' part. I hope that was not too confusing now! ;)

ADD REPLYlink written 8.1 years ago by David Langenberger9.5k
3
gravatar for Pablo
8.1 years ago by
Pablo1.9k
Canada
Pablo1.9k wrote:

You can use SnpEff (http://snpeff.sourceforge.net/) in BED mode. For instance, if your sample is human (hg19):

# Dowloand the database:
java -jar snpEff.jar download -v hg19

# Annotate your file 'chip.bed'
java -jar snpEff.jar eff -v -i bed -o bed hg19 chip.bed > chip.eff.bed
ADD COMMENTlink written 8.1 years ago by Pablo1.9k

isn't that for SNPs (presumably of width 1)?

ADD REPLYlink written 8.1 years ago by Jeremy Leipzig19k

Sorry if I totally misunderstood your point, but the op has asked where the peak is located, isn't it?

ADD REPLYlink written 8.1 years ago by Arun2.3k

yes peaks can be of varying widths (370bp in the example above) and might span more than one feature. I'm not sure snpeff is expecting this.

ADD REPLYlink modified 8.1 years ago • written 8.1 years ago by Jeremy Leipzig19k

Yes, it works for this. I created the BED feature specially for Chip-Seq analysis.

ADD REPLYlink written 8.1 years ago by Pablo1.9k

For reference, it seems like @Pablo is the author of snpEff.

ADD REPLYlink written 8.1 years ago by brentp23k
3
gravatar for Steve Lianoglou
8.1 years ago by
Steve Lianoglou5.0k
US
Steve Lianoglou5.0k wrote:

If you'd like to use R/Bioconductor, you might try these packages

ADD COMMENTlink written 8.1 years ago by Steve Lianoglou5.0k

thanks i will try both packages and if i have any question i will ask you!!!!! best regards Eleni PS: eisai ellinas?

ADD REPLYlink written 8.1 years ago by e.karasmani120

could you please help me about how can i use the variant annotation to define from my data.frame where the peaks are located?

ADD REPLYlink written 8.1 years ago by e.karasmani120

You'll want to convert the peaks you have stored in the data.frame to a GRanges object. The vignette I linked to for the package has an example of what to do from there.

ADD REPLYlink written 8.1 years ago by Steve Lianoglou5.0k

can you please help me with the ChIPpeakanno??? if you can check that post i would be grateful http://www.biostars.org/post/show/45636/question-about-chippeakanno-and-iranges/#45638 thank you very much!!

ADD REPLYlink written 8.1 years ago by e.karasmani120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1584 users visited in the last hour