6
6
Entering edit mode
9.4 years ago
ChIP ▴ 600

Hi!

I am almost sure, guys who are actively involved in ChIP-Seq data analysis have plotted the mapped reads around the TSS in a certain window(say 10 Kb).

What I want to do is to plot reads of my histone marks (in bam file) around TSS with CpG and TSS without CpG (Essentially a coverage profile).

It would be very kind of you, if you could share the script that you had used for the same.

Kindly help.

chip-seq • 13k views
0
Entering edit mode

I would not recommend to plot raw read counts directly from BAM as this may give you false impressions due to biases. For example, promoters by default have more reads do to GC and open chromatin bias. Thus, it is better to use log2 ratios or the difference between treatment and input.

5
Entering edit mode
9.4 years ago
Ryan Dale 5.0k

The Python package metaseq (docs) will do this directly from BAM files. Here's an example gist showing how you'd use it:

(edit: fix embedded gist)

Also, you may want to see this worked example from the docs: https://pythonhosted.org/metaseq/example_session.html

0
Entering edit mode

Thank you for the script. Could you please, tell me how to install this package metaseq.

0
Entering edit mode

I updated the script to have some installation instructions. It depends on the scientific Python stack, which can be troublesome to install -- see http://www.scipy.org/install.html if you run into issues.

0
Entering edit mode

Hi! I have a unique error with this line

tss = tss.window(w=5000)


and as soon as I mark this line as commented, the whole script works fine.

The error says

Command was:

bedtools window -a tss.bed -w 5000


Error message was:

*****ERROR: Need -a and -b files.


I think, the command should be:

tss = tss.window(cpg,w=5000)


Could you be kind enough to suggest an edit in your script.

Thank you

0
Entering edit mode

Sorry, I had used the wrong BEDTools program -- it should be "slop" instead of "window". The intent was to extend single-bp TSS feature out to a total size of 10kb. Thanks for catching this; I edited the script.

0
Entering edit mode

Hi. I am a bit of a newbie, so I apologise in advance, however I haven't found any answers elsewhere. I have tried to run the script above, but when I do

>>> y1 = bam.array(tss_with_cpg, bins=bins, processes=processes, fragment_size=fragment_size)

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/metaseq/_genomic_signal.py", line 122, in array
chunksize=chunksize, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/metaseq/array_helpers.py", line 382, in _array_parallel
itertools.repeat(kwargs)))
File "/usr/lib/python2.7/multiprocessing/pool.py", line 227, in map
return self.map_async(func, iterable, chunksize).get()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 528, in get
raise self._value
TypeError: 'NoneType' object is not iterable


I have tried with several bam files. Why is the bam metaseq._genomic_signal.BamSignal object not iterable? Or have I missed something?

Thank you very much in advance.

1
Entering edit mode

Lots of improvements have been made to metaseq since this was first posted, and the git branch referenced in the script is no longer valid.

I've just updated the script with 1) a reference to the latest installation instructions and 2) instructions for getting example data, so you don't need your own data to run this. Please report back (either here or on the github issues page) if this still doesn't work for you.

0
Entering edit mode

Will definitely try this now, works directly from BAM!

BTW, if I want to subtract the forward-reverse reads on BAM, especially for histone Chip-seq, how do I get the subtracted read from BAM. Then this subtracted BAM file could be used for plotting.

0
Entering edit mode

Not sure what you mean by "subtract the forward-reverse reads". But you can manipulate your BAM using other means (e.g., samtools) and then read the new BAM with metaseq as in the example.

Also, see the BamSignal.local_coverage docs for other arguments that might be helpful for you -- specifically fragment_size, shift_width, read_strand, and stranded.

1
Entering edit mode
0
Entering edit mode

I essentially learned that from the HTSeq package document http://www-huber.embl.de/users/anders/HTSeq/doc/tss.html#tss note that, the y axis is not normalized to count per million (cpm), for that purpose, one needs to count the total reads number (you can do it with HTSeq,but it is rather slow, samtools will be better), and divide the numpy array by it.

1
Entering edit mode
9.4 years ago
KCC ★ 4.0k

Assuming you have your TSS and CpG files in BED format. You can use either a command from bedtools like intersectBED to get TSS which have CpG regions and ones that don't. Then you use an online tool like Cistrome or CEAS to get a profile across your BED regions.

Sorry, I can't be more specific as you haven't mentioned any of the file formats that you are using. I can adjust my answer if you do.

0
Entering edit mode

It is a BAM file.

0
Entering edit mode

Seqminer should do what you are looking for

0
Entering edit mode
9.4 years ago
Ian 5.9k

I recently discovered NGS PLOT on biostars. It is R-based and can work from the command line or via GALAXY. https://code.google.com/p/ngsplot/

0
Entering edit mode
9.4 years ago
bede.portz ▴ 540

I think HOMER may be able to do what you are asking.

HOMER manual for annotatePeaks.pl

If you carry out the detailed annotation, the output should contain information about the distance of peaks to many known genomic features, including CpG islands. I think you could parse the information you want from this file.

I provided a link to the manual for the command that may give you what you need so you can try to ascertain whether or not HOMER can provide the functionality you desire. I would suggest reading the preceding parts of the manual in its entirety, as HOMER will do some things like normalization that you may or may not want, and there are preceding steps you will need to carry out before using annotatePeaks.pl including downloaded the version of your genome of interest via HOMER, making tag directories, etc.

I wish I could provide more detailed and expert advice, but I am new to this analysis myself.

Someone suggested SeqMiner. My concern with SeqMiner is the poor documentation. It can generate heatmaps from data VERY quickly, which is attractive, but seems to be more of a black box than HOMER. I think SeqMiner give you less control over the analysis, and/or the control you do have is less intuitive, at least to me.

0
Entering edit mode
7.8 years ago
Ram ▴ 180

Dear Ryan,

Is it possible to know by using this script why there is shift of +2kb for particular histone mark but rather it has to be on TSS ?

Thanks a lot.