Question: Dexseq_Count And Bwa
1
gravatar for K. Wyatt Mcmahon
8.6 years ago by
Virginia Bioinformatics Institute, Virginia Tech
K. Wyatt Mcmahon50 wrote:

Hello BioStar friends!

I've gotten so much help here previously and I'm looking forward to help here again!

I'm trying to use DEXSeq (a Bioconductor package) to identify and quantitate changes in splicing following expression of a transgene. I've performed the mapping of my Illumina paired-end data using BWA and now have SAM files. No problems up to this point.

The problem occurs when I run dexseq_count.py, a Python script associated with the DEXSeq package which is dependent on HTSeq in Python. I keep getting the following messages:

"UserWarning: Read HWUSI-EAS381R:1:10:12325:4848#CGATGT claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)"

After a couple of searches I've found that HTSeq seems not to like BWA-generated SAM files. What I have not found was a work-around for this (although some people write things like "I wrote a perl script to get around this problem" with no further details).

I can use Bowtie (which is recommended), but we have a BWA Convey system so the speed up would be phenomenal, and I have quite a few of these.

Can someone help me with this?

Thanks in advance,

Wyatt

htseq bwa • 3.0k views
ADD COMMENTlink written 8.6 years ago by K. Wyatt Mcmahon50
1
gravatar for Damian Kao
8.6 years ago by
Damian Kao15k
USA
Damian Kao15k wrote:

From this [?]old seqanswers post[?], the bottom two posts by Simon Anders who wrote HTSeq:

If you have paired-end data, htseq-count expects that the SAM file is sorted by read name. This is because the SAM format prescribes that paired reads have the same read ID, and hence, sorting causes the read pairs to appear in adjacent line.pairs to appear in adjacent line.

Your error suggest that HTSeq cannot find the aligned mate data. It might be because your SAM file is not sorted by read name.

ADD COMMENTlink written 8.6 years ago by Damian Kao15k

Thanks so much for your help, DK. I've tried this and have thus far not been able to get the script to work. I'm trying the sampe step again in hopes that something went wrong there and that's why I'm not getting the script to work.

ADD REPLYlink written 8.6 years ago by K. Wyatt Mcmahon50
1
gravatar for Michael Dondrup
8.6 years ago by
Bergen, Norway
Michael Dondrup47k wrote:

Did you try to run your output through SAMtools sort command?

ADD COMMENTlink written 8.6 years ago by Michael Dondrup47k

Yes, I've tried SAMTools, and although the sorting worked, the script still found problems. I've even used a linux one-liner. Neither has worked. I'm redoing the sampe step now just hoping that something went wrong there. Thanks for your help!

ADD REPLYlink written 8.6 years ago by K. Wyatt Mcmahon50
0
gravatar for K. Wyatt Mcmahon
8.6 years ago by
Virginia Bioinformatics Institute, Virginia Tech
K. Wyatt Mcmahon50 wrote:

I wanted everyone to know what ended up happening with this:

I had used SAMtools sort, and was still getting the same problem.

It seems that when I demultiplexed my Illumina run, the same number of reads did not end up in each paired fastq file. So I had around 15 million reads in my read1 file, but only 11 million in my read2 file.

I wrote a perl script to make sure the same names are in each file. If you'd like a copy of this perl script comment on my answer here and I'll send it to you. Or if there's a way to share easily I'd be happy to use that too.

Thanks for your help!

Wyatt

ADD COMMENTlink written 8.6 years ago by K. Wyatt Mcmahon50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1481 users visited in the last hour