Question: Some problems with DEXSEQ analyse
gravatar for darbinator
18 months ago by
darbinator220 wrote:

Hello everyone,

After perform a DEG (differential expression analysis) with DESEQ2 (Alignement with STAR, counts reads with featureCounts), I wanted to perform this time the same analysis but at the scale of isoforms with DEXSEQ.

I have 150bp paired-ends reads from two conditions (3 controls and 3 Arabidopsis mutants for one gene) that I previously aligned with STAR.

The first python script use by DEXSEQ to prepare a GFF file from my GTF run successfully

But the second script doesn't work well

When I give my bam files like this:

python2 -p yes -s yes -r pos -f bam Reference/reference.DEXSeq.gff alignement.sorted.bam > counts.tsv

I have the following error message:

"If you are using alignment files in a bam format, please update your HTSeq to 0.5.4p4 or higher"

I don't understand why I have this error message because when I import htseq in python to see the version, it's indicate version 0.11.2 ..

I have partly solved the problem by converting bam files to sam files on the fly and indicate the option -f sam, the script perform well but now I have strange results , I have many eons with 0 counts, for exemple for the first gene:

AT1G01010:001   0
AT1G01010:002   0
AT1G01010:003   0
AT1G01010:004   0
AT1G01010:005   0
AT1G01010:006   0

While for the gene level I have many count:

AT1G01010   204

Did someone have an explication to my problem ?

Thank's in advance

ADD COMMENTlink modified 9 months ago by kristoffer.vittingseerup3.4k • written 18 months ago by darbinator220

To your first question, by any chance HTSeq doesn't meet the version requirement in python2, but it does in python(3)?

ADD REPLYlink written 18 months ago by Eric Lim1.7k

Hum my pipeline run with python2.7 so I haven't test it, I will test with python3

ADD REPLYlink written 18 months ago by darbinator220

I checked by running the python script in python 3. There it gives syntax error. So that doesn't really solve the problem.

ADD REPLYlink written 4 months ago by rohitsatyam102190

I would first check that the reads mapping to AT1G01010 overlap with those exon coordinates. DESeq2 and HTSeq use reads mapping anywhere across the gene, so it's possible those are intronic reads.

I would generate a bed file with the coordinates for all six exons then intersect the bed with the bam and see what the output looks like:

intersectBed -a at1g01010.bed -b alignement.sorted.bam -c > outfile.bed

This should tell you what you should expect the output from to look like.

ADD REPLYlink written 18 months ago by shawn.w.foley1.2k

I have perform what you did and I have this result:

1   3631    3913    50
1   3996    4276    76
1   4486    4605    46
1   4706    5095    79
1   5174    5326    37

I have 0 for all the exons, with all of my samples, I can't figure why ..

Even if my my data are paired end, I have launch the script with -p no instead of -p yes, iwithout really believing in it

But surprisingly I have results more convenant:

 AT1G01010:001  32
AT1G01010:002   34
AT1G01010:003   24
AT1G01010:004   37
AT1G01010:005   15

Even if is a bit far than the output of the bed file

ADD REPLYlink written 18 months ago by darbinator220

You are saying that your samples are paired-end but that you only obtain expected results when specifying single-end in DEXSeq? How did you align the data with STAR? - please show the commands.

ADD REPLYlink written 18 months ago by Kevin Blighe65k

with STAR, I use this command line:

STAR --runThreadN {threads} --sjdbGTFfile {input.gtf} \       # GTF of A.thaliana
        --sjdbOverhang '+str(READ_LENGHT-1)+' --genomeDir {input.starref} \      # --sjdbOverhang 149 in this case
        --outFileNamePrefix Mapping/{wildcards.sample} --readFilesIn {input.r1} {input.r2} \. # outFileName for my 8 samples
        --readFilesCommand "gunzip -c" \    # My trimming data are in gz format, so I gunzip them
        --outSAMtype BAM SortedByCoordinate; \   # Sorting by coordinate, for feature counts
        mv Mapping/{wildcards.sample}Aligned.sortedByCoord.out.bam {output};\
        mv Mapping/{wildcards.sample}*out* Mapping/Out'
ADD REPLYlink modified 18 months ago • written 18 months ago by darbinator220
gravatar for kristoffer.vittingseerup
9 months ago by
European Union
kristoffer.vittingseerup3.4k wrote:

If you want to do differential transcript analysis with DEXSeq you need transcript level quantification - which you cannot reliably get with HTSeq (or featureCounts) for that matter. You can read more about such considerations as well as suggested tools here.

Also if you are interested in isoform switches (when there between conditions is a change in which isoforms are used within a gene) with predicted functional consequences you could consider my R package IsoformSwitchAnalyzeR which can faciliate the entire analysis (including the DEXSeq part) both for individual genes and for creating genome-wide summaries. You can find examples of the results produced in this section of the vignette.

ADD COMMENTlink written 9 months ago by kristoffer.vittingseerup3.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1877 users visited in the last hour