Question: Plot the nucleosome distance to TSS
0
gravatar for khl0798
5.9 years ago by
khl07980
China
khl07980 wrote:

I am new to bioinformatics and python,Now I have downloaded nucleosome-seq data and TSS-seq(which also names CAGE) data,so I try to draw a picture like this:

http://www-huber.embl.de/users/anders/HTSeq/doc/tss.html

The difference was I have the TSS data,so I don't need to subtract the TSS feature from Human GTF annotation. But now I face the problems---the question I want to ask is:

I don't know how to write python script to convent my bed file to instead of GTF.

my TSS data are bed format like this:chrom,chromStart,strand.PS:the read length is 27bp;

here is my script,could anybody tell me why I am fault?

import HTSeq
import pysam
from matplotlib import pyplot

bamfile = HTSeq.BAM_Reader("DRR000367.bam" )
bedfile = HTSeq.BED_Reader("Poly.bed")
coverage = HTSeq.GenomicArray( "auto", stranded=False, typecode="i" )

for almnt in bamfile:
    if almnt.aligned:
        almnt.iv.length = fragmentsize
        coverage[ almnt.iv ] += 1

list(bedfile)
profile = numpy.zeros( 2*halfwinwidth, dtype='i' )
for p in tsspos:
    window = HTSeq.GenomicInterval( p.chrom, p.pos - halfwinwidth, p.pos + halfwinwidth, "." )
    wincvg = numpy.fromiter( coverage[window], dtype='i', count=2*halfwinwidth )
    if p.strand == "+":
        profile += wincvg
    else:
        profile += wincvg[::-1]

pyplot.plot( numpy.arange( -halfwinwidth, halfwinwidth ), profile )
pyplot.show()

Thanks very much.Any who give me any information would be pretty appreciate.

 

chip-seq • 2.6k views
ADD COMMENTlink modified 5.8 years ago by chemcehn190 • written 5.9 years ago by khl07980
1

You might give deepTools a try.
 

ADD REPLYlink written 5.9 years ago by Devon Ryan97k

"my TSS data are bed format like this:chrom,chromStart,strand"

I don't know if I am getting this right, but it seems that your TSS isn't in bed format. This is how bed format should look like. 

BTW: cross-posted on biology.stackexchange

ADD REPLYlink modified 5.9 years ago • written 5.9 years ago by PoGibas4.8k

I know the standard format of bed file, but I think the reads are all 27bp ,so the chromEnd  seem no use   ,so eventually I  use the three "chrom,chromStart,strand"  to set my bed file.

ADD REPLYlink written 5.9 years ago by khl07980

Please post this as comment.

What if you try to reformat your bed file into something like this:

chromosome start start+27 end
ADD REPLYlink written 5.9 years ago by PoGibas4.8k

ok,thank you.Now I find  http://www-huber.embl.de/users/anders/HTSeq/doc/features.html can use my own annotation  in a flat file, with each line describing a feature and giving  coordinates, so I write python script like this:

import csv
genes = HTSeq.GenomicArray( ["chr1","chr2","chr3","chr4","chr5","chr6","chr7","chr8","chr9","chr10","chr11","chr12","chr13","chr14","chr15","chr16","chr17","chr18","chr19","chr20","chr21","chr22","chrX","chrY" ],typecode = "O" )
for feature in bedfile:                                                              
      iv = HTSeq.GenomicInterval( feature.chrom,int (feature.start) - 1000 , int (feature.end) - 1000,feature.strand )

but it's error:

ValueError                                Traceback (most recent call last)
<ipython-input-6-aa6393877b3a> in <module>()
----> 1 for p in bedfile:
      2            iv = HTSeq.GenomicInterval(p.chrom,int (p.start) - 1000 , int (p.start) - 1000,p.strand )
      3 

/home/khl/.local/lib/python2.7/site-packages/HTSeq-0.6.1-py2.7-linux-x86_64.egg/HTSeq/__init__.pyc in __iter__(self)
   1025          if len(fields) > 9:
   1026             raise ValueError, "BED file line contains more than 9 fields"
-> 1027          iv = GenomicInterval( fields[0], int(fields[1]), int(fields[2]), fields[5] if len(fields) > 5 else "." )
   1028          f = GenomicFeature( fields[3] if len(fields) > 3 else "unnamed", "BED line", iv )
   1029          f.score = float( fields[4] ) if len(fields) > 4 else None

ADD REPLYlink written 5.9 years ago by khl07980

I am sorry ,the error I show was the false bed format as you pointed .  

ADD REPLYlink written 5.9 years ago by khl07980

I've reformatted your post so the python should render correctly. You might double check this, since I really didn't go through the code to ensure proper nesting.

ADD REPLYlink written 5.9 years ago by Devon Ryan97k

Thank you .These are my complete python script.

import HTSeq
import pysam
from matplotlib import pyplot
fragmentsize = 200
bamfile = HTSeq.BAM_Reader("DRR000367.bam" )
coverage = HTSeq.GenomicArray( "auto", stranded=False, typecode="i" )
for almnt in bamfile:
   if almnt.aligned:
      almnt.iv.length = fragmentsize
      coverage[ almnt.iv ] += 1

import csv
genes = HTSeq.GenomicArray( ["chr1","chr2","chr3","chr4","chr5","chr6","chr7","chr8","chr9","chr10","chr11","chr12","chr13","chr14","chr15","chr16",
"chr17","chr18","chr19","chr20","chr21","chr22","chrX","chrY" ],typecode = "O" ,stranded=True)
for (chrom ,start,end,strand ) in \ csv.reader( open ("Poly.csv"),delimiter = "\t" ):
    iv = HTSeq.GenomicInterval( chrom, int(start -1000),int(start + 1000),strand )

profile = numpy.zeros( 2*halfwinwidth, dtype='i' )      
for p in tsspos:
   window = HTSeq.GenomicInterval( p.chrom, p.pos - halfwinwidth, p.pos + halfwinwidth, "." )
   wincvg = numpy.fromiter( coverage[window], dtype='i', count=2*halfwinwidth )
   if p.strand == "+":
      profile += wincvg
   else:
      profile += wincvg[::-1]
pyplot.plot( numpy.arange( -halfwinwidth, halfwinwidth ), profile )  
pyplot.show()

 

ADD REPLYlink modified 5.9 years ago by Devon Ryan97k • written 5.9 years ago by khl07980

You must have either left something out or changed something. The following does nothing:

genes = HTSeq.GenomicArray( ["chr1","chr2","chr3","chr4","chr5","chr6","chr7","chr8","chr9","chr10","chr11","chr12","chr13","chr14","chr15","chr16",
"chr17","chr18","chr19","chr20","chr21","chr22","chrX","chrY" ],typecode = "O" ,stranded=True)
for (chrom ,start,end,strand ) in \ csv.reader( open ("Poly.csv"),delimiter = "\t" ):
    iv = HTSeq.GenomicInterval( chrom, int(start -1000),int(start + 1000),strand )

 

Also, tsspos is never defined, so profile will just be a bunch of zeroes.

ADD REPLYlink written 5.9 years ago by Devon Ryan97k

Thank you,Actually I want to make my TSS annotation file to produce formats as following:

1.by using HTSeq.GenomicArray function to produce a format like this:

{'chr2': {'.': <ChromVector object, chr2:[0,Inf)/+, step>},
 'chr1': {'.': <ChromVector object, chr1:[0,Inf)/+, step>}}
below is an example which I imitated from

 http://www-huber.embl.de/users/anders/HTSeq/doc/features.html#HTSeq.GenomicFeature.attr

>>> ga = HTSeq.GenomicArray( [ "chr1", "chr2" ], stranded=False )
>>> ga.chrom_vectors 
{'chr2': {'.': <ChromVector object, chr2:[0,Inf)/+, step>},
 'chr1': {'.': <ChromVector object, chr1:[0,Inf)/+, step>}}
2.then I can use HTSeq.GenomicInterval function to produce like this 

<GenomicInterval object 'chr3', [123203,127245), strand '+'>
eventually I can continue to run the scripts to make the plot.

I don't know if you could understand what I said.

ADD REPLYlink written 5.9 years ago by khl07980

Yes, I understand that, I was pointing out that you never actually used genes or iv after setting them. And you just write over iv every time rather than appending to it.

ADD REPLYlink written 5.9 years ago by Devon Ryan97k

Thank you.genes and iv  which I set had made some errors,so I can't continue  ,I have the bed file,so if I follow the instructions  as what the guides  told,I could succeed. But I was wrong.

ADD REPLYlink written 5.9 years ago by khl07980

I've tried to make similar figures, try Genomation R pacakge http://www.bioconductor.org/packages/devel/bioc/html/genomation.html

ADD REPLYlink written 5.9 years ago by Ming Tang2.6k

Thank you .And these problems had confused me a lot, I will try.

ADD REPLYlink written 5.9 years ago by khl07980
0
gravatar for chemcehn
5.8 years ago by
chemcehn190
Germany
chemcehn190 wrote:

You can try HOMER. It is doing the task you want. The important thing is that you need to have your data in some standard format such as e.g. BED file, as was pointed out by the others.

ADD COMMENTlink modified 5.7 years ago • written 5.8 years ago by chemcehn190

Thank you! At this moment I haven't solved this problem ,I will have a try.

 

ADD REPLYlink written 5.8 years ago by khl07980
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2041 users visited in the last hour