DEXSeq error: Count files do not correspond to the flattened annotation file
0
0
Entering edit mode
9.5 years ago
Assa Yeroslaviz ★ 1.8k

Hi,

I am running the DEXSeq (v. 1.12.1) with two different modi, one with the strand information and one without it. I just wanted to see the differences.

The run with strand information works fine, but the one without gives me the above error message.

I first ran the dexseq_prepare_annotation.py script, but modified it:

exons = HTSeq.GenomicArrayOfSets( "auto", stranded=False )

This gives me a gff file with no strand information, where the 7th column of the file is overall '.'

I than run the dexseq_count.py script with the gff file.

python dexseq_count.py -s no -f bam cDmel_NoStrandInfo_DEXSeq.gff $file

I now get the error message with the genes which are being aggregated, due to the fact the the strand information is missing. I know I can use the -r no option, but I would like to count also these genes.

The first gene name I can't find is XLOC_000002 (these are cufflinks IDs). It is aggregated together with XLOC_001386.

The problem I have is that these genes are aggregated both in my newly created gff file

grep -Hrn "XLOC_000002" NoStrandInfo_DEXSeq.gff
NoStrandInfo_DEXSeq.gff:27:2L     dexseq.py   aggregate_gene  21842   25151   .       .       .       gene_id "XLOC_001386+XLOC_000002"
NoStrandInfo_DEXSeq.gff:28:2L     dexseq.py   exonic_part     21842   21918   .       .       .       transcripts "TCONS_00000004"; exonic_part_number "001"; gene_id "XLOC_001386+XLOC_000002"
NoStrandInfo_DEXSeq.gff:29:2L     dexseq.py   exonic_part     21919   22687   .       .       .       transcripts "TCONS_00000004+TCONS_00003381"; exonic_part_number "002"; gene_id "XLOC_001386+XLOC_00000 "
NoStrandInfo_DEXSeq.gff:30:2L     dexseq.py   exonic_part     22688   22709   .       .       .       transcripts "TCONS_00000004"; exonic_part_number "003"; gene_id "XLOC_001386+XLOC_000002"
NoStrandInfo_DEXSeq.gff:31:2L     dexseq.py   exonic_part     22743   22768   .       .       .       transcripts "TCONS_00003381"; exonic_part_number "004"; gene_id "XLOC_001386+XLOC_000002"
NoStrandInfo_DEXSeq.gff:32:2L     dexseq.py   exonic_part     22769   22935   .       .       .       transcripts "TCONS_00000004+TCONS_00003381"; exonic_part_number "005"; gene_id "XLOC_001386+XLOC_00000 "
...

as well as in the counts file:

 grep -Hrn "XLOC_000002" cufflinks.txt
cufflinks.txt:9237:XLOC_001386+XLOC_000002:001        0
cufflinks.txt:9238:XLOC_001386+XLOC_000002:002        18
cufflinks.txt:9239:XLOC_001386+XLOC_000002:003        1
cufflinks.txt:9240:XLOC_001386+XLOC_000002:004        1
cufflinks.txt:9241:XLOC_001386+XLOC_000002:005        1
...

So I don't understand why they can't be found as a pair. Is there a different reason for this error message?

Thanks
Assa

dexseq gtf • 3.0k views
ADD COMMENT

Login before adding your answer.

Traffic: 1922 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6