Question

DEXSeq - No Exon Counts

0

Entering edit mode

21 months ago

osiemen ▴ 30

Using DEXSeq I would like to identify the differential exon usage in my samples. I did my alignment on GRCh38 with fasta and .gtf file from Ensembl (gencode v29). The Alignment was done by STAR (not done by me). I use dexseq_count.py to quantify the reads on every exon in my .bam files, however i have 0 counts for each read. I tried to run the script with various options for paired end data and strandedness however i receive the same outcome. I believe that the alignment/mapping was done correct since it was previously used in another analysis with rMATS.

output after running python3.8 dexseq_count.py -p yes -r pos -s no -f bam gencodev29_DEXSeq.gff RNA1.Aligned.sortedByCoord.out.bam RNA1.out.txt :

""ENSG00000000003.14"":"001"    0
""ENSG00000000003.14"":"002"    0
""ENSG00000000003.14"":"003"    0
""ENSG00000000003.14"":"004"    0
""ENSG00000000003.14"":"005"    0
""ENSG00000000003.14"":"006"    0
""ENSG00000000003.14"":"007"    0
""ENSG00000000003.14"":"008"    0
""ENSG00000000003.14"":"009"    0
""ENSG00000000003.14"":"010"    0

not sure if it has something to do with sample file but ill post a example of the bam file output.

samtools view RNA1.Aligned.sortedByCoord.out.bam | head -n 5 :

HWI-1KL111:59:C1VWTACXX:4:2115:16063:46921      419     1       11634   1       74M     =       11692   132     ACAAAGTGTTTACTTTTGGATTTTTGCCAGTCTAACAGGTGAAGCCCTGGAGATTCTTATTAGTGATTTGGGCT      @@BFADFDDFFBDGHHGBAAFGHBEG><A<FHHIGGGIB?<FH@HAFDFGC>BFFHIIEGGIC4@;@C=CGBC@        NH:i:4  HI:i:3  AS:i:146        nM:i:0
HWI-1KL111:59:C1VWTACXX:4:1307:11846:59737      419     1       11641   1       74M     =       11890   322     GTTTACTTTTGGATTTTTGCCAGTCTAACAGGTGAAGCCCTGGAGATTCTTATTAGTGATTTGGGCTGGGGCCT      CBBFFFFFHGHHHIJJJJJJJJJIJIJJIJJIHIFIJJJJJJJHIAHIJJIGIIIIFHIJJJJJJJJJIJHHFF        NH:i:4  HI:i:3  AS:i:145        nM:i:0
HWI-1KL111:59:C1VWTACXX:4:2108:3156:75335       355     1       11675   1       73M     =       11873   272     AAGCCCTGGAGATTCTTATTAGTGATTTGGGCTGGGGCCTGGCCATGTGTATTTTTTTAAATTTCCACTGATG       @<?DBDDD?;B+<<EFI<?EHCC?AEHHB<@3;EFEFBDFCFGICFFD9*/88C<C;AA?=?ADBDDDDCCAA NH:i:4  HI:i:3  AS:i:145        nM:i:0
HWI-1KL111:59:C1VWTACXX:4:2310:6416:3302        419     1       11681   1       74M     =       11792   185     TGGAGATTCTTATTAGTGATTTGGGCTGGGGCCTGGCCATGTGTATTTTTTTAAATTTCCACTGATGATTTTGC      C@CFFDFFHHHHHJJIEHJJJJJJIJIIIJJJJIIJJJIIJIH?FGIIJJFHIJJJJJJIHHHHGGFFFFFFFE        NH:i:4  HI:i:2  AS:i:146        nM:i:0
HWI-1KL111:59:C1VWTACXX:4:2115:16063:46921      339     1       11692   1       74M     =       11634   -132    ATTAGTGATTTGGGCTGGGGCCTGGCCATGTGTATTTTTTTAAATTTCCACTGATGATTTTGCTGCATGGCCGG      CCC@CCCCC??BB@AHECDAIIHHBDGIGGB<CEEGEHDD>IIIHF<;A2EHEEIIHGCBC?BHFHDDDDD@@@        NH:i:4  HI:i:3  AS:i:146        nM:i:0

Does anyone know what could cause this error? All tips are welcome!

DEXSeq Exon RNA-Seq Expression Count • 1.5k views

ADD COMMENT • link 21 months ago by osiemen ▴ 30

0

Entering edit mode

Not very helpful, but I'd recommend running Salmon for gene counts with RNA-seq data rather than Star. STAR is a splice aware mapper but doesn't give you counts. As a bonus, there are a ton of resources out there to bring Salmon results into R for DEXseq analysis.

ADD REPLY • link 21 months ago by Trivas ★ 1.7k

1

Entering edit mode

You can use Salmon together with DEXSeq for differential transcript analysis, but not differential exon analysis, which was the original intended use of DEXSeq. If you want to do differential exon analysis, you need to align the reads to the genome using a splice aware mapper, like STAR or HiSat, and count reads aligning to each exon with something like featureCounts or HTSeq (which is what dexseq_count uses) like the OP is doing.

ADD REPLY • link 21 months ago by i.sudbery 19k

0

Entering edit mode

Thanks for the input. Unfortunately i can't change much about the mapper being STAR. Looking at previous post/questions etc I should be possible to get the exon counts using DEXSeq even-though i used STAR as my mapper.

ADD REPLY • link 21 months ago by osiemen ▴ 30

0

Entering edit mode

STAR can be configured to give gene counts. RSEM can be run on STAR output if it was configured to output a transcriptome bam. But it looks like OP doesn't have either.

ADD REPLY • link 21 months ago by swbarnes2 14k

score 3 · Accepted Answer · 2022-08-02

3

Entering edit mode

21 months ago

swbarnes2 14k

First thing to do is check that chromosome names in your bam match those in your gff.

ADD COMMENT • link 21 months ago by swbarnes2 14k

0

Entering edit mode

hmm using samtools idxstats sample.bam i received the chromosomes names :

    1       248956422       6161107 0
    10      133797422       2072304 0
    11      135086622       3392976 0
    12      133275309       3644413 0
    13      114364328       768297  0
    14      107043718       2009500 0
    15      101991189       2102502 0
    16      90338345        2864296 0
    17      83257441        4603825 0
    18      80373285        627395  0
    19      58617616        4256823 0
    2       242193529       3630209 0
    20      64444167        1482783 0
    21      46709983        926941  0
    22      50818468        1313749 0
    3       198295559       4151787 0
    4       190214555       1826560 0
    5       181538259       2584086 0
    6       170805979       2671860 0
    7       159345973       2502102 0
    8       145138636       1846538 0
    9       138394717       2260200 0
    MT      16569   6551210 0
    X       156040895       1784075 0
    Y       57227415        2814    0
    KI270728.1      1872759 948     0
    KI270727.1      448248  759     0
....
.....

Not sure if it looks okey, what are the chromosomes that come after the chrY --> KI270728.1 , KI270727.1

ADD REPLY • link 21 months ago by osiemen ▴ 30

0

Entering edit mode

Well the names the chromosome names used in the GFF file are in deed different than the ones used in the BAM file. "chr" is used in the gff file to indicate the chromosomes while the bam file only makes uses the numbers of the chromosomes :

cut -f 1 gencodev26_DEXSeq.gff | sort -u :

chr1
chr10
chr11
chr12
chr13
chr14

Bam:

samtools idxstats RNA1.Aligned.sortedByCoord.out.bam | cut -f 1 | sort -u :
*
1
10
11
12
13
14

Not sure if this could cause the problem?

ADD REPLY • link 21 months ago by osiemen ▴ 30

2

Entering edit mode

That absolutely could be the problem. You need to use the same gff or gtf that was used by STAR. Or, if you can't, edit the gff so the chromosome names match

ADD REPLY • link 21 months ago by swbarnes2 14k

0

Entering edit mode

Yes this seems to be the problem. For some reason the first BAM file I used was the only one that did not contain the "chr". So using a sample file with the chromosome names that match the gff solved the problem! Thanks a lot!

ADD REPLY • link 21 months ago by osiemen ▴ 30