Htseq-count error "start is larger than end"
1
0
Entering edit mode
8.7 years ago
ejmcmaho • 0

Hello,

I am attempting to do htseq-count for an RNA-Seq differential expression data analysis and I have run into an error with the GFF file I am using. The exact error is as follows:

Error occured when processing GFF file (line 3497 of file /Users/*.gff):
  start is larger than end
  [Exception type: ValueError, raised in _HTSeq.pyx:64]

I looked specifically at line 3497 and the start position is larger than the stop position which makes sense why there is an error, but I am not sure how to fix it. Is there a way to just omit that line of the .gff file in terminal? Any advice would be greatly appreciated.

Thank you,
Evan

htseq-count GFF • 4.3k views
ADD COMMENT
0
Entering edit mode
Can't you edit it and replace the coordinates?
ADD REPLY
0
Entering edit mode

Thats what I was hoping to do, but how would you go about doing that? I tried opening up the gff file in excel, but it messed with the overall format of the gff file

ADD REPLY
0
Entering edit mode

Open it in a text editor (think wordpad or notepad on Windows or Editor on a Mac). You should NEVER use Excel in bioinformatics.

ADD REPLY
0
Entering edit mode

Thank you for the tip, that made it possible to edit the file without error. Another problem that I have run into though is that there are way too many places in the gff file where the "start is larger than end" to edit them all by hand. I looked into the options for HTSeq-count, but it didn't seem like any of them would prevent the error.

ADD REPLY
0
Entering edit mode
7.9 years ago
elijahlowe • 0

For future references I wrote a quick python script to fix the problem. This script simply separate each line by tab (\t) and reverse the odd if the start column is greater than the end column, and then print out the gff3.

file=open('file_name.fa','r')
    for line in file:
        line=line.rstrip('\n')
        if len(line.split('\t')) == 8: #make sure it's data line and not header
        scaffold,source,type,start,end,score,strand,phase,attributes=line.split('\t')
        if int(start)>int(end):
            print "%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s" % (scaffold,source,type,end,start,score,strand,phase,attributes)
        else:
            print line

Hope this helps.

ADD COMMENT

Login before adding your answer.

Traffic: 1588 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6