Some problems with DEXSEQ analyse
1
1
Entering edit mode
5.5 years ago
vin.darb ▴ 300

Hello everyone,

After perform a DEG (differential expression analysis) with DESEQ2 (Alignement with STAR, counts reads with featureCounts), I wanted to perform this time the same analysis but at the scale of isoforms with DEXSEQ.

I have 150bp paired-ends reads from two conditions (3 controls and 3 Arabidopsis mutants for one gene) that I previously aligned with STAR.

The first python script use by DEXSEQ dexseq_prepare_annotation.py) to prepare a GFF file from my GTF run successfully

But the second script doesn't work well

When I give my bam files like this:

python2 dexseq_count.py -p yes -s yes -r pos -f bam Reference/reference.DEXSeq.gff alignement.sorted.bam > counts.tsv

I have the following error message:

"If you are using alignment files in a bam format, please update your HTSeq to 0.5.4p4 or higher"

I don't understand why I have this error message because when I import htseq in python to see the version, it's indicate version 0.11.2 ..

I have partly solved the problem by converting bam files to sam files on the fly and indicate the option -f sam, the script perform well but now I have strange results , I have many eons with 0 counts, for exemple for the first gene:

AT1G01010:001   0
AT1G01010:002   0
AT1G01010:003   0
AT1G01010:004   0
AT1G01010:005   0
AT1G01010:006   0

While for the gene level I have many count:

AT1G01010   204

Did someone have an explication to my problem ?

Thank's in advance

RNA-Seq isoform expression DEXSEQ • 3.2k views
ADD COMMENT
1
Entering edit mode

To your first question, by any chance HTSeq doesn't meet the version requirement in python2, but it does in python(3)?

ADD REPLY
0
Entering edit mode

Hum my pipeline run with python2.7 so I haven't test it, I will test with python3

ADD REPLY
0
Entering edit mode

I checked by running the python script in python 3. There it gives syntax error. So that doesn't really solve the problem.

ADD REPLY
0
Entering edit mode

I would first check that the reads mapping to AT1G01010 overlap with those exon coordinates. DESeq2 and HTSeq use reads mapping anywhere across the gene, so it's possible those are intronic reads.

I would generate a bed file with the coordinates for all six exons then intersect the bed with the bam and see what the output looks like:

intersectBed -a at1g01010.bed -b alignement.sorted.bam -c > outfile.bed

This should tell you what you should expect the output from dexseq_count.py to look like.

ADD REPLY
0
Entering edit mode

I have perform what you did and I have this result:

1   3631    3913    50
1   3996    4276    76
1   4486    4605    46
1   4706    5095    79
1   5174    5326    37

I have 0 for all the exons, with all of my samples, I can't figure why ..

Even if my my data are paired end, I have launch the script with -p no instead of -p yes, iwithout really believing in it

But surprisingly I have results more convenant:

 AT1G01010:001  32
AT1G01010:002   34
AT1G01010:003   24
AT1G01010:004   37
AT1G01010:005   15

Even if is a bit far than the output of the bed file

ADD REPLY
0
Entering edit mode

You are saying that your samples are paired-end but that you only obtain expected results when specifying single-end in DEXSeq? How did you align the data with STAR? - please show the commands.

ADD REPLY
0
Entering edit mode

with STAR, I use this command line:

STAR --runThreadN {threads} --sjdbGTFfile {input.gtf} \       # GTF of A.thaliana
        --sjdbOverhang '+str(READ_LENGHT-1)+' --genomeDir {input.starref} \      # --sjdbOverhang 149 in this case
        --outFileNamePrefix Mapping/{wildcards.sample} --readFilesIn {input.r1} {input.r2} \. # outFileName for my 8 samples
        --readFilesCommand "gunzip -c" \    # My trimming data are in gz format, so I gunzip them
        --outSAMtype BAM SortedByCoordinate; \   # Sorting by coordinate, for feature counts
        mv Mapping/{wildcards.sample}Aligned.sortedByCoord.out.bam {output};\
        mv Mapping/{wildcards.sample}*out* Mapping/Out'
ADD REPLY
0
Entering edit mode
4.8 years ago

If you want to do differential transcript analysis with DEXSeq you need transcript level quantification - which you cannot reliably get with HTSeq (or featureCounts) for that matter. You can read more about such considerations as well as suggested tools here.

Also if you are interested in isoform switches (when there between conditions is a change in which isoforms are used within a gene) with predicted functional consequences you could consider my R package IsoformSwitchAnalyzeR which can faciliate the entire analysis (including the DEXSeq part) both for individual genes and for creating genome-wide summaries. You can find examples of the results produced in this section of the vignette.

ADD COMMENT

Login before adding your answer.

Traffic: 1516 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6