Question: Some problems with DEXSEQ analyse
0
gravatar for darbinator
9 months ago by
darbinator190
darbinator190 wrote:

Hello everyone,

After perform a DEG (differential expression analysis) with DESEQ2 (Alignement with STAR, counts reads with featureCounts), I wanted to perform this time the same analysis but at the scale of isoforms with DEXSEQ.

I have 150bp paired-ends reads from two conditions (3 controls and 3 Arabidopsis mutants for one gene) that I previously aligned with STAR.

The first python script use by DEXSEQ dexseq_prepare_annotation.py) to prepare a GFF file from my GTF run successfully

But the second script doesn't work well

When I give my bam files like this:

python2 dexseq_count.py -p yes -s yes -r pos -f bam Reference/reference.DEXSeq.gff alignement.sorted.bam > counts.tsv

I have the following error message:

"If you are using alignment files in a bam format, please update your HTSeq to 0.5.4p4 or higher"

I don't understand why I have this error message because when I import htseq in python to see the version, it's indicate version 0.11.2 ..

I have partly solved the problem by converting bam files to sam files on the fly and indicate the option -f sam, the script perform well but now I have strange results , I have many eons with 0 counts, for exemple for the first gene:

AT1G01010:001   0
AT1G01010:002   0
AT1G01010:003   0
AT1G01010:004   0
AT1G01010:005   0
AT1G01010:006   0

While for the gene level I have many count:

AT1G01010   204

Did someone have an explication to my problem ?

Thank's in advance

ADD COMMENTlink modified 13 days ago by kristoffer.vittingseerup2.9k • written 9 months ago by darbinator190
1

To your first question, by any chance HTSeq doesn't meet the version requirement in python2, but it does in python(3)?

ADD REPLYlink written 9 months ago by Eric Lim1.6k

Hum my pipeline run with python2.7 so I haven't test it, I will test with python3

ADD REPLYlink written 9 months ago by darbinator190

I would first check that the reads mapping to AT1G01010 overlap with those exon coordinates. DESeq2 and HTSeq use reads mapping anywhere across the gene, so it's possible those are intronic reads.

I would generate a bed file with the coordinates for all six exons then intersect the bed with the bam and see what the output looks like:

intersectBed -a at1g01010.bed -b alignement.sorted.bam -c > outfile.bed

This should tell you what you should expect the output from dexseq_count.py to look like.

ADD REPLYlink written 9 months ago by shawn.w.foley1.1k

I have perform what you did and I have this result:

1   3631    3913    50
1   3996    4276    76
1   4486    4605    46
1   4706    5095    79
1   5174    5326    37

I have 0 for all the exons, with all of my samples, I can't figure why ..

Even if my my data are paired end, I have launch the script with -p no instead of -p yes, iwithout really believing in it

But surprisingly I have results more convenant:

 AT1G01010:001  32
AT1G01010:002   34
AT1G01010:003   24
AT1G01010:004   37
AT1G01010:005   15

Even if is a bit far than the output of the bed file

ADD REPLYlink written 9 months ago by darbinator190

You are saying that your samples are paired-end but that you only obtain expected results when specifying single-end in DEXSeq? How did you align the data with STAR? - please show the commands.

ADD REPLYlink written 9 months ago by Kevin Blighe52k

with STAR, I use this command line:

STAR --runThreadN {threads} --sjdbGTFfile {input.gtf} \       # GTF of A.thaliana
        --sjdbOverhang '+str(READ_LENGHT-1)+' --genomeDir {input.starref} \      # --sjdbOverhang 149 in this case
        --outFileNamePrefix Mapping/{wildcards.sample} --readFilesIn {input.r1} {input.r2} \. # outFileName for my 8 samples
        --readFilesCommand "gunzip -c" \    # My trimming data are in gz format, so I gunzip them
        --outSAMtype BAM SortedByCoordinate; \   # Sorting by coordinate, for feature counts
        mv Mapping/{wildcards.sample}Aligned.sortedByCoord.out.bam {output};\
        mv Mapping/{wildcards.sample}*out* Mapping/Out'
ADD REPLYlink modified 9 months ago • written 9 months ago by darbinator190
0
gravatar for kristoffer.vittingseerup
13 days ago by
European Union
kristoffer.vittingseerup2.9k wrote:

If you want to do differential transcript analysis with DEXSeq you need transcript level quantification - which you cannot reliably get with HTSeq (or featureCounts) for that matter. You can read more about such considerations as well as suggested tools here.

Also if you are interested in isoform switches (when there between conditions is a change in which isoforms are used within a gene) with predicted functional consequences you could consider my R package IsoformSwitchAnalyzeR which can faciliate the entire analysis (including the DEXSeq part) both for individual genes and for creating genome-wide summaries. You can find examples of the results produced in this section of the vignette.

ADD COMMENTlink written 13 days ago by kristoffer.vittingseerup2.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2022 users visited in the last hour