Question

Creation Of Exoncountset In Dexseq

0

Entering edit mode

12.1 years ago

alittleboy ▴ 220

Hi,

I am using DEXSeq for differential exon usage tests. In the vignette of producing ExonCountSet object: http://bioconductor.org/packages/devel/data/experiment/vignettes/pasilla/inst/doc/create_objects.pdf, the author used dexseq_prepare_annotation.py to convert the GTF file to GFF file. In the GFF output, I see that (since the GTF is downloaded from Ensembl) the gene_id's start with "ENSG".

I know that the next step is to use dexseq_count.py on the GFF and SAM files to generate counts. However, because currently we have the count data file (which we prefer to use), we are hoping to use our own counts (i.e. the treated2fb.txt as in the vignette example) for the analysis. The issue is that, our count files contain EntrezGene ID's, NOT Ensembl IDs, and the conversion between the two is not bijective (i.e. 1-1). Therefore, we I run the read.HTSeqCounts() function in R, the error message "Count files do not correspond to the flattened annotation file" appears.

Question:

(1) is Ensembl GTF the only input for dexseq_prepare_annotation.py? It seems the resultant GFF file contains only Ensembl gene IDs, accordingly...

(2) in my case of non-Ensembl gene IDs, how can I instruct or manipulate the codes to generate an ExonCountSet object?

Thank you!

exon ensembl • 4.7k views

ADD COMMENT • link updated 7.5 years ago by Biostar 20 • written 12.1 years ago by alittleboy ▴ 220

score 2 · Answer 1 · 2013-06-19

2

Entering edit mode

12.1 years ago

venks ▴ 740

Just to cross check see if your following the same

python dexseq_prepare_annotation.py hg19.gtf hg19.gff

You might want to try UCSC reference genome sequence. Also see if you are using right reference say for eg., HG19 build.

Then try

~some_location/samtools view ~some_/location/file.bam | python /some_location_where_dexseq_py_is/dexseq_count.py --paired=no -s no -a 10 /location/reference37.gff - "countfile.txt"

This will give you count table.

To my knowledge you are getting the error in ECS because of the wrong reference genome that you might have picked.

Good luck.

ADD COMMENT • link 12.1 years ago by venks ▴ 740

0

Entering edit mode

Thanks a lot! I think we prefer to use Ensembl instead of UCSC for the annotation file. Simon pointed out here one solution: http://seqanswers.com/forums/showthread.php?p=108191#post108191

ADD REPLY • link 12.1 years ago by alittleboy ▴ 220

1

Entering edit mode

Perfect! I never used the newexoncount function. Hope it generated ecs without any problems.

ADD REPLY • link 12.1 years ago by venks ▴ 740