Question: counting reads over repeats
0
gravatar for R
5.9 years ago by
R10
....
R10 wrote:

To identify the expression levels of repeats in my RNA-Seq samples, I'm trying to count reads over repeats. The RNA-seq data (strand specific, dUTP ...) has been mapped and non-uniquely mapped reads (NH:i:X>1) were separated for further analysis. But which program should I use to count reads over gtf file (RepeatMasker) ?

Htseq-count (-s reverse) returns me only zero values and report all as __alignment_not_unique: (reads with more than one reported alignment)

rna-seq • 2.5k views
ADD COMMENTlink modified 5.9 years ago by Ashutosh Pandey12k • written 5.9 years ago by R10
3
gravatar for Ashutosh Pandey
5.9 years ago by
Philadelphia
Ashutosh Pandey12k wrote:

If you are using default settings with HTseq-count, then it will ignore all the reads with mapping quality of less than 10. Thus, it will ignore non-unique reads (reads aligned to more than one position) as Tophat scores these reads with mapping quality of 3 (2 locations),2 (3 locations), 1 (4-9 locations) and 0 ( 10 or more locations). Use "-a <minaqual>" as - a 0 so that HTSeq-count will counts these reads towards expression counts. 

ADD COMMENTlink modified 5.9 years ago • written 5.9 years ago by Ashutosh Pandey12k

But I used default, which is 0

ADD REPLYlink written 5.9 years ago by R10

The default for the latest version is 10. So it seems you are using an older version. I looked more into it and here is what I found. From version 0.4.6 onwards, HTSeq count skips reads that are non-uniquely mapped according to the 'NH' field. Source:http://www-huber.embl.de/users/anders/HTSeq/doc/history.html#version-0-5-4

I just greped for "NH" in count.py script that comes with HTSeq and found the following lines:

grep "NH" HTSeq-0.6.1/HTSeq/scripts/count.py 
               if r.optional_field( "NH" ) > 1:
               if ( r[0] is not None and r[0].optional_field( "NH" ) > 1 ) or \
                     ( r[1] is not None and r[1].optional_field( "NH" ) > 1 ):

You need to modify "1" to some number suitable for your analysis and HTSeq-count should work for you. Though highly unlikely but please make sure that there are no restrictions with modifying the code.  

ADD REPLYlink modified 5.9 years ago • written 5.9 years ago by Ashutosh Pandey12k

thank you so much  :)

 

ADD REPLYlink written 5.9 years ago by R10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1668 users visited in the last hour