Entering edit mode
8.3 years ago
varsha619
▴
90
Hi, Sorry in advance if my question is too basic. I am new to ChIP-seq analysis and I have a list of .anno files from Homer annotatepeaks.pl (which annotates .bed files). I would like to plot a Density vs Distance from TSS graph using Homer .anno files. I saw different packages available (such as R ChIPseeker) that use .bed files as input to generate the same, after annotating the peaks as well. I was wondering if there is a package that uses the already annotated Homer output .anno files as input to generate a similar plot. Please let me know, thank you for your help!
You're looking for a density plot, or metagene where the x axis is 'region distance' and y axis is 'chip-seq signal' correct?
If this is the case then HOMER can do this fairly easily. You'll need a bed file of your regions of interest, and a bedGraph or wig file of your ChIP-seq data. Then you can run something similar to:
annotatePeaks.pl $REGIONSBED $GENOME -size $REGIONDISTANCE -hist 50 -bedGraph $BEDGRAPHOFCHIPSEQ > $OUTPUTFILENAME
Where $REGIONSBED = the bed file of your regions of interest, $GENOME = the genome of your ChIP-seq data such as hg19, or mm9, etc, $REGIONDISTANCE = the number of basepairs the metagene plot should cover typically for TSS, 6000 is fine for hg19 (HOMER takes the number you input here and divides it by 2 and then spans that so for 6000 it would be 3000 +/- bp in either direction from the central point), and $BEDGRAPHOFCHIPSEQ = a bedgraph file of your chip-seq data.
Once you have the output file, you can plot the coverage column in Excel.
Thank you Sinji, just to confirm is the -bedGraph $BEDGRAPHOFCHIPSEQ generated in this command here or is it to be generated elsewhere and given as input here along with the .bed file?
Yes, you can generate this using any number of tools. Bedtools, and deeptools are two easy to use programs that can do this for you.
Hi, I used the MACS -B option to generate the bedGraphs = macs14 -t 1.sam -c 2.sam -f SAM -g dm -n out -B. This gave me a bedGraph output folder which I used for annotatePeaks.pl which for some reason gave me all the coverage values as 0. Would you please help me fix this? I appreciate your help.
Can you do a
head
of your bedgraph file?This is what the 'head' looks like -
track type=bedGraph name="NT60_treat_chr2L" description="Extended tag pileup from MACS version 1.4.3 20131216" chr2L 288 351 1 chr2L 351 367 2 chr2L 367 387 1
Try
head -n 20
... I want to see what your chr notation looks like.Otherwise, it might be all that metadata that's messing up HOMER. You could try cutting out all the information before you get to your typical bedgraph coverage information:
and see if that works, if that makes any sense.
Hi, I'm sorry may be this is an long time ago issue.
But can I get some advice for making this plot from the center of the peaks? For example, the example you said before is like -3000bp to 3000bp. When I plot this values, can I make the 0bp point of the plot being the center of the peaks?
It'll be very grateful to have some advice. Thanks a lot
Woongjae
Hi @Woongjae, in the Homer command by @Sinji if you use -size 6000, you would get -3000 to 3000bp plots. annotatePeaks.pl $REGIONSBED $GENOME -size $REGIONDISTANCE -hist 50 -bedGraph $BEDGRAPHOFCHIPSEQ > $OUTPUTFILENAME
As the thread mentions, the bed and bedGraph files can be generated by various softwares, I typically use MACS14 -B option. This gives output .bdg files in .gz format for each chr with a header, I used the linux command - sed '1d' to remove the header line and concatenate all the chr files into 1 .bdg file.
Hi varsha! Thank you for the reply. I get what you mean. Thank you very much. What I'm curios about is, that if I get the histogram values by homer, do the peaks' summit position(highest tag density of each peaks) go to the center of the -size? For example if the range I got was -3000~3000bp, are the positions of each peaks are in the 0bp? Maybe because of my poor English, I'm not sure you could get the point...Sorry for that.
Thank you very much!
To make the plot the center of a peak you have to manually adjust your peak file so that the start and end position are both the middle of the peak.
Something like
awk -v OFS='\t' '{print $1, ($2+$3)/2, ($2+$3)/2, $4, $5, $6}' $PEAKFILE > $OUTPUT
... this is assuming your peak file is a standard BED format.Original PEAKFILE:
Edited PEAKFILE:
Thank you Sinji!
Your comments really helped me!
This is the head output - track type=bedGraph name="NT60_treat_chr2L" description="Extended tag pileup from MACS version 1.4.3 20131216"
chr2L 288 351 1
chr2L 351 367 2
chr2L 367 387 1
chr2L 387 430 2
chr2L 430 466 1
chr2L 809 845 1
chr2L 845 888 2
Another thing I noticed was that the bedGraph input I had tried giving Homer was the whole bedGraph output folder from MACS (since there were multiple .gz output files in the folder for each chromosome). Could this be the issue? Please let me know what you think, thanks!
Yes that could be a issue, your best bet might be to remove the first track line of every file,
cat
them together, and then unzip them and feed that file into HOMER. It requires a very specific format, which is unfortunate, but necessary I suppose.Oops sorry I forgot to "Add reply", I will try that suggestion. Thank you so much!
This worked! Thanks..
Also, how do I find out if the coverage value in the output is per bp or million bp?
This depends on your bedgraph input. If you input a bedgraph file that has been normalized to million bp then your coverages are normalized as well. If you have MACS2 output, I think it has a parameter you can set to output normalized bedGraphs, but you'd have to double check your command and the MACS2 manual.
@Sinji Can you please help me with this question. Thanks
Please use
ADD REPLY
when responding to existing posts.