Question: How to plot ChIP-seq Density vs Distance from TSS using Homer annoted files
2
gravatar for varsha619
2.8 years ago by
varsha61970
varsha61970 wrote:

Hi, Sorry in advance if my question is too basic. I am new to ChIP-seq analysis and I have a list of .anno files from Homer annotatepeaks.pl (which annotates .bed files). I would like to plot a Density vs Distance from TSS graph using Homer .anno files. I saw different packages available (such as R ChIPseeker) that use .bed files as input to generate the same, after annotating the peaks as well. I was wondering if there is a package that uses the already annotated Homer output .anno files as input to generate a similar plot. Please let me know, thank you for your help!

chip-seq homer • 3.1k views
ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by varsha61970
1

You're looking for a density plot, or metagene where the x axis is 'region distance' and y axis is 'chip-seq signal' correct?

If this is the case then HOMER can do this fairly easily. You'll need a bed file of your regions of interest, and a bedGraph or wig file of your ChIP-seq data. Then you can run something similar to:

annotatePeaks.pl $REGIONSBED $GENOME -size $REGIONDISTANCE -hist 50 -bedGraph $BEDGRAPHOFCHIPSEQ > $OUTPUTFILENAME

Where $REGIONSBED = the bed file of your regions of interest, $GENOME = the genome of your ChIP-seq data such as hg19, or mm9, etc, $REGIONDISTANCE = the number of basepairs the metagene plot should cover typically for TSS, 6000 is fine for hg19 (HOMER takes the number you input here and divides it by 2 and then spans that so for 6000 it would be 3000 +/- bp in either direction from the central point), and $BEDGRAPHOFCHIPSEQ = a bedgraph file of your chip-seq data.

Once you have the output file, you can plot the coverage column in Excel.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by Sinji2.8k

Thank you Sinji, just to confirm is the -bedGraph $BEDGRAPHOFCHIPSEQ generated in this command here or is it to be generated elsewhere and given as input here along with the .bed file?

ADD REPLYlink written 2.8 years ago by varsha61970

Yes, you can generate this using any number of tools. Bedtools, and deeptools are two easy to use programs that can do this for you.

ADD REPLYlink written 2.8 years ago by Sinji2.8k

Hi, I used the MACS -B option to generate the bedGraphs = macs14 -t 1.sam -c 2.sam -f SAM -g dm -n out -B. This gave me a bedGraph output folder which I used for annotatePeaks.pl which for some reason gave me all the coverage values as 0. Would you please help me fix this? I appreciate your help.

ADD REPLYlink written 2.8 years ago by varsha61970

Can you do a head of your bedgraph file?

ADD REPLYlink written 2.8 years ago by Sinji2.8k

This is what the 'head' looks like -

track type=bedGraph name="NT60_treat_chr2L" description="Extended tag pileup from MACS version 1.4.3 20131216" chr2L 288 351 1 chr2L 351 367 2 chr2L 367 387 1

ADD REPLYlink written 2.8 years ago by varsha61970

Try head -n 20 ... I want to see what your chr notation looks like.

Otherwise, it might be all that metadata that's messing up HOMER. You could try cutting out all the information before you get to your typical bedgraph coverage information:

chr1    10000    10500

and see if that works, if that makes any sense.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by Sinji2.8k

Hi, I'm sorry may be this is an long time ago issue.

But can I get some advice for making this plot from the center of the peaks? For example, the example you said before is like -3000bp to 3000bp. When I plot this values, can I make the 0bp point of the plot being the center of the peaks?

It'll be very grateful to have some advice. Thanks a lot

Woongjae

ADD REPLYlink written 18 months ago by woongjaej10

Hi @Woongjae, in the Homer command by @Sinji if you use -size 6000, you would get -3000 to 3000bp plots. annotatePeaks.pl $REGIONSBED $GENOME -size $REGIONDISTANCE -hist 50 -bedGraph $BEDGRAPHOFCHIPSEQ > $OUTPUTFILENAME

As the thread mentions, the bed and bedGraph files can be generated by various softwares, I typically use MACS14 -B option. This gives output .bdg files in .gz format for each chr with a header, I used the linux command - sed '1d' to remove the header line and concatenate all the chr files into 1 .bdg file.

ADD REPLYlink written 18 months ago by varsha61970

Hi varsha! Thank you for the reply. I get what you mean. Thank you very much. What I'm curios about is, that if I get the histogram values by homer, do the peaks' summit position(highest tag density of each peaks) go to the center of the -size? For example if the range I got was -3000~3000bp, are the positions of each peaks are in the 0bp? Maybe because of my poor English, I'm not sure you could get the point...Sorry for that.

Thank you very much!

ADD REPLYlink written 18 months ago by woongjaej10
1

To make the plot the center of a peak you have to manually adjust your peak file so that the start and end position are both the middle of the peak.

Something like awk -v OFS='\t' '{print $1, ($2+$3)/2, ($2+$3)/2, $4, $5, $6}' $PEAKFILE > $OUTPUT ... this is assuming your peak file is a standard BED format.

Original PEAKFILE:

chr1 100 200 peak1 0 .

chr1 300 500 peak2 0 .

Edited PEAKFILE:

chr1 150 150 peak1 0 .

chr1 400 400 peak2 0 .
ADD REPLYlink modified 18 months ago • written 18 months ago by Sinji2.8k

Thank you Sinji!

Your comments really helped me!

ADD REPLYlink written 18 months ago by woongjaej10

This is the head output - track type=bedGraph name="NT60_treat_chr2L" description="Extended tag pileup from MACS version 1.4.3 20131216"

chr2L 288 351 1

chr2L 351 367 2

chr2L 367 387 1

chr2L 387 430 2

chr2L 430 466 1

chr2L 809 845 1

chr2L 845 888 2

Another thing I noticed was that the bedGraph input I had tried giving Homer was the whole bedGraph output folder from MACS (since there were multiple .gz output files in the folder for each chromosome). Could this be the issue? Please let me know what you think, thanks!

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by varsha61970
1

Yes that could be a issue, your best bet might be to remove the first track line of every file, cat them together, and then unzip them and feed that file into HOMER. It requires a very specific format, which is unfortunate, but necessary I suppose.

ADD REPLYlink written 2.8 years ago by Sinji2.8k

Oops sorry I forgot to "Add reply", I will try that suggestion. Thank you so much!

ADD REPLYlink written 2.8 years ago by varsha61970

This worked! Thanks..

ADD REPLYlink written 2.8 years ago by varsha61970

Also, how do I find out if the coverage value in the output is per bp or million bp?

ADD REPLYlink written 2.8 years ago by varsha61970

This depends on your bedgraph input. If you input a bedgraph file that has been normalized to million bp then your coverages are normalized as well. If you have MACS2 output, I think it has a parameter you can set to output normalized bedGraphs, but you'd have to double check your command and the MACS2 manual.

ADD REPLYlink written 2.8 years ago by Sinji2.8k

@Sinji Can you please help me with this question. Thanks

ADD REPLYlink written 13 months ago by Bioinformatist Newbie230

Please use ADD REPLY when responding to existing posts.

ADD REPLYlink written 2.8 years ago by genomax67k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1934 users visited in the last hour