editing count.py script of HTSeq to work with bed file
2
0
Entering edit mode
8.4 years ago
Kssr ▴ 110

I have a bed file with peaks called on chipseq data with each line in the format,

1    15520461    15520630    samplex_peak_1

I want to count reads across the peaks called using htseq count. I am new to using HTSeq count, I have installed the package and I'm trying to edit the count.py script. I came across class HTSeq.BED_Reader, and I have edited the following part of the code:

 bed = HTSeq.BED_Reader( bed_filename )
   I = 0
   try:
      for f in bed:
         if f.type == feature_type:
            try:
               feature_id = f.attr[ id_attribute ]
            except KeyError:
               raise ValueError, ( "Feature %s does not contain a '%s' attribute" %
                  ( f.name, id_attribute ) )
            if stranded != "no" and f.iv.strand == ".":
               raise ValueError, ( "Feature %s at %s does not have strand information but you are "
                  "running htseq-count in stranded mode. Use '--stranded=no'." %
                  ( f.name, f.iv ) )
            features[ f.iv ] += feature_id
            counts[ f.attr[ id_attribute ] ] = 0
         I += 1
         if I % 100000 == 0 and not quiet:
            sys.stderr.write( "%d BED lines processed.\n" % I )
   except:
      sys.stderr.write( "Error occured when processing BED file (%s):\n" % bed.get_line_number_string() )
      raise

   if not quiet:
      sys.stderr.write( "%d BED lines processed.\n" % I )

I'm not sure what parameters to pass for feature type and id attribute. Also, I'm not sure if other parts of the code need to be edited. Could any one help me with this? Thanks!

htseq bed • 2.1k views
ADD COMMENT
1
Entering edit mode
8.4 years ago
sunhanice ▴ 240

Agree with Istvan, it is easier to convert bed to gtf instead of changing the source code.

Please refer to this post, how to use a bed file with htseq-count

make sure the gtf file has 9 columns like the following,

scaffold00001    UNCC    exon    488375    489180    .    -    0    gene_id "101_g";
ADD COMMENT
0
Entering edit mode
8.4 years ago

I would suggest converting your BED file to GTF rather than editing the program. You'll get by much quicker.

The cufflinks suite has a program called gffread that can do that.

ADD COMMENT

Login before adding your answer.

Traffic: 1785 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6