Question: CLIP-seq peak analysis
gravatar for mb314
8 months ago by
mb3140 wrote:

I have a list of ~1500 (over a few samples) chromosome:peakstart:peakend from my recent CLIP-seq experiment. I want to know: 1) What genes each peak maps to 2) What sequences are included in those peaks. Is there a quick way to do this? I only could figure out how to the full CDS, not just the peak.

I was able to find all the genes by inputting my peaks as a csv into UCSC Table Browser, but when I get an output it puts every transcript possible on different lines (ex if my peak mapped to gene X, I would get gene X transcript 1, gene X transcript 2, etc). I would like to copy back this list to my original data in excel, but now the lines don't match up. Does anyone have a recommendation?

Thank you!

ADD COMMENTlink modified 8 months ago by shawn.w.foley1.1k • written 8 months ago by mb3140
gravatar for shawn.w.foley
8 months ago by
shawn.w.foley1.1k wrote:

I'd recommend converting your file into bed format then using bedtools to intersect with annotated genes (this intersection will give you the full gene length but it will include gene names). You can also run the bedtools getfasta command to convert the bed file into a FASTA file.

If this is CLIP-seq then you should also have strand information (if you performed a strand specific library prep, if you DO have strand information add a -s flag to the two bedtools commands), otherwise there will be ambiguity as to which gene your peak intersects with.

Convert from Chr:start:end to a bed format:

sed 'y/\:/\t/' peakFile.txt > peakFile.bed

Intersect with some annotated gene file (you should be able to get these from UCSC:

bedtools intersect -a annotatedGeneFile.bed -b peakFile.bed -wa -u > annotatedGenesThatOverlapPeaks.bed

And using the fasta file used for mapping:

bedtools getfasta -fi genomeFasta.fa -bed peakFile.bed -fo peakFile.fa
ADD COMMENTlink written 8 months ago by shawn.w.foley1.1k

Thank you for your help! Very useful.

I got bedtools to work with your suggestions and ran the following command sucessfully:

bedtools intersect -wa -wb -a hg19.bed -b Peaks18.3.bed

However, I tried to force strandedness with "-s" and got nothing.

Do you know why adding -s would prevent it from working? I'm using bedtools/2.26.0.

ADD REPLYlink written 8 months ago by mb3140

Do you have strand information (either a + or a -) in the 6th column for both bed files?

ADD REPLYlink written 8 months ago by shawn.w.foley1.1k

Yes, the 6th column has + and - for both files. However I converted my hg19 file from a gtf to a bed to use for this. the hg19 bed file has 9 column. The first 6 looks as they should, but the last three saw things like "unknown exon" and "Parent=NR_204540" rather than "thickStart" , "thickEnd" and "itemRgb." DO you think that makes a difference.

ADD REPLYlink written 8 months ago by mb3140

It could, I would cut the file so that it's six columns. If that doesn't work then paste a line from each input and I'll have a look.

ADD REPLYlink written 8 months ago by shawn.w.foley1.1k

I changed hg19.bed to a six column file, but still couldn't get the -s option to work. Here's a line from each of my inputs:

hg19.bed: chr1 11873 14409 eID=DDX11L1 . +

Peaks18.3.bed: chr9 75785164 75785290 18.3 1 +

ADD REPLYlink written 8 months ago by mb3140

The format looks right. Did you perform a strand-specific library prep and strand specific mapping of the libraries? It's possible that you mapped the libraries to the incorrect strand, try running with the -S flag (capital s), this will find peaks that map to the opposite strand.

If you do a line count of the intersectBed -s and intersectBed -S they should sum to intersectBed with no strand flag. if the -S flag works I'd wonder if there was an error in your original mapping step, if you still get no lines then there must be a formatting problem.

ADD REPLYlink written 8 months ago by shawn.w.foley1.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 967 users visited in the last hour