Hi All!
I am relatively new to DEXSeq. I was trying to follow a tutorial given at Bioconductor here
chr16 dexseq_prepare_annotation.py aggregate_gene 69320140 69408571 . - . gene_id "ENSG00000259900.5+ENSG00000272617.3+ENSG00000260371.1+ENSG00000132604.11+ENSG00000258429.2+ENSG00000157315.5+ENSG00000213380.15"
chr16 dexseq_prepare_annotation.py exonic_part 69320140 69321046 . - . transcripts "ENST00000564419.1"; exonic_part_number "001"; gene_id "ENSG00000259900.5+ENSG00000272617.3+ENSG00000260371.1+ENSG00000132604.11+ENSG00000258429.2+ENSG00000157315.5+ENSG00000213380.15"
It's weird to see TERF2 (Chromosome 16: 69,355,567-69,408,571) aggregate with RP11-343C2.9 because of partial overlap of one exon, as shown here. Wouldn't this bias the exon usage if we are interested in studying only exon usage of one gene.
How is the analysis affected if I consider overlapping/shared exon separately using "-r no" option?
Also, in DEXSeq, I am unable to perform gene subsetting using dxd = dxd[geneIDs( dxd ) %in% genesForSubset,]
I am not sure how to make the "geneIDsinsubset.txt"
file properly. I have a .csv file with a single gene ensemble ID: ENSG00000132604.11. However running the above command, DEXSeq throws the following error
Error in `$<-.data.frame`(`*tmp*`, "dispersion", value = NA) : replacement has 1 row, data has 0
Please help...
I think that you can try that and then report back here, if you have time. Thanks.
Regarding "geneIDsinsubset.txt", it should just be a single-column file of gene IDs that will be used to filter your data.
Kevin