Question: DEXSeq error: Count files do not correspond to the flattened annotation file
0
gravatar for Assa Yeroslaviz
4.1 years ago by
Assa Yeroslaviz1.1k
Munich
Assa Yeroslaviz1.1k wrote:

Hi,

I am running the DEXSeq (v. 1.12.1) with two different modi, one with the strand information and one without it. I just wanted to see the differences.

the run with strand information works fine, but the one without gives me the above error message.

I first ran the dexseq_prepare_annotation.py script, but modified it:

exons = HTSeq.GenomicArrayOfSets( "auto", stranded=False )

this gives me a gff file with no strand information, where the 7th column of the file is overall '.'

I than run the dexseq_count.py script with the gff file.

python dexseq_count.py -s no -f bam cDmel_NoStrandInfo_DEXSeq.gff $file

I now get the error message with the genes which are being aggreagated, due to the fact the the strand information is missing. I know I can use the -r no option, but I would like to count also these genes.

the first gene name I can't find is XLOC_000002 (these are cufflinks IDs). It is aggregated together with XLOC_001386.

The problem I have is that these genes are aggregated both in my newly created gff file

grep -Hrn "XLOC_000002" NoStrandInfo_DEXSeq.gff
NoStrandInfo_DEXSeq.gff:27:2L     dexseq.py   aggregate_gene  21842   25151   .       .       .       gene_id "XLOC_001386+XLOC_000002"
NoStrandInfo_DEXSeq.gff:28:2L     dexseq.py   exonic_part     21842   21918   .       .       .       transcripts "TCONS_00000004"; exonic_part_number "001"; gene_id "XLOC_001386+XLOC_000002"
NoStrandInfo_DEXSeq.gff:29:2L     dexseq.py   exonic_part     21919   22687   .       .       .       transcripts "TCONS_00000004+TCONS_00003381"; exonic_part_number "002"; gene_id "XLOC_001386+XLOC_00000 "
NoStrandInfo_DEXSeq.gff:30:2L     dexseq.py   exonic_part     22688   22709   .       .       .       transcripts "TCONS_00000004"; exonic_part_number "003"; gene_id "XLOC_001386+XLOC_000002"
NoStrandInfo_DEXSeq.gff:31:2L     dexseq.py   exonic_part     22743   22768   .       .       .       transcripts "TCONS_00003381"; exonic_part_number "004"; gene_id "XLOC_001386+XLOC_000002"
NoStrandInfo_DEXSeq.gff:32:2L     dexseq.py   exonic_part     22769   22935   .       .       .       transcripts "TCONS_00000004+TCONS_00003381"; exonic_part_number "005"; gene_id "XLOC_001386+XLOC_00000 "
...

as well as in the counts file:

 grep -Hrn "XLOC_000002" cufflinks.txt
cufflinks.txt:9237:XLOC_001386+XLOC_000002:001        0
cufflinks.txt:9238:XLOC_001386+XLOC_000002:002        18
cufflinks.txt:9239:XLOC_001386+XLOC_000002:003        1
cufflinks.txt:9240:XLOC_001386+XLOC_000002:004        1
cufflinks.txt:9241:XLOC_001386+XLOC_000002:005        1
...

So I don't understand why they can't be found as a pair. Is there a different reason for this error message?

 

thanks

Assa

gtf dexseq • 1.6k views
ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by Assa Yeroslaviz1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1215 users visited in the last hour