Question: Creation Of Exoncountset In Dexseq
0
gravatar for alittleboy
6.4 years ago by
alittleboy210
USA
alittleboy210 wrote:

Hi,

I am using DEXSeq for differential exon usage tests. In the vignette of producing ExonCountSet object: http://bioconductor.org/packages/devel/data/experiment/vignettes/pasilla/inst/doc/create_objects.pdf, the author used dexseq_prepare_annotation.py to convert the GTF file to GFF file. In the GFF output, I see that (since the GTF is downloaded from Ensembl) the gene_id's start with "ENSG".

I know that the next step is to use dexseq_count.py on the GFF and SAM files to generate counts. However, because currently we have the count data file (which we prefer to use), we are hoping to use our own counts (i.e. the treated2fb.txt as in the vignette example) for the analysis. The issue is that, our count files contain EntrezGene ID's, NOT Ensembl IDs, and the conversion between the two is not bijective (i.e. 1-1). Therefore, we I run the read.HTSeqCounts() function in R, the error message "Count files do not correspond to the flattened annotation file" appears.

Question:

(1) is Ensembl GTF the only input for dexseq_prepare_annotation.py? It seems the resultant GFF file contains only Ensembl gene IDs, accordingly...

(2) in my case of non-Ensembl gene IDs, how can I instruct or manipulate the codes to generate an ExonCountSet object?

Thank you!

ensembl exon • 2.9k views
ADD COMMENTlink modified 22 months ago by Biostar ♦♦ 20 • written 6.4 years ago by alittleboy210
2
gravatar for venkateshr89
6.4 years ago by
venkateshr89690
United States
venkateshr89690 wrote:

Just to cross check see if your following the same

python dexseq_prepare_annotation.py hg19.gtf hg19.gff

You might want to try UCSC reference genome sequence. Also see if you are using right reference say for eg., HG19 build.

Then try

~some_location/samtools view ~some_/location/file.bam | python /some_location_where_dexseq_py_is/dexseq_count.py --paired=no -s no -a 10 /location/reference37.gff - "countfile.txt"

This will give you count table.

To my knowledge you are getting the error in ECS because of the wrong reference genome that you might have picked.

Good luck.

ADD COMMENTlink written 6.4 years ago by venkateshr89690

Thanks a lot! I think we prefer to use Ensembl instead of UCSC for the annotation file. Simon pointed out here one solution: http://seqanswers.com/forums/showthread.php?p=108191#post108191

ADD REPLYlink written 6.4 years ago by alittleboy210
1

Perfect! I never used the newexoncount function. Hope it generated ecs without any problems.

ADD REPLYlink written 6.4 years ago by venkateshr89690
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1299 users visited in the last hour