Question

Errors with RSEM/bowtie2

0

Entering edit mode

4 days ago

Olivia • 0

Hi,

I keep getting an error message after running the align_estimate_abundance step for RNA-seq, specifically using RSEM and bowtie2. I have previously run Trinity with the Trimmomatic step as well as CD-HIT-EST.

The error message is this:

CMD: set -o pipefail && bowtie2 --no-mixed --no-discordant --gbar 1000 --end-to-end -k 200  -q -X 800 -x /fs/ess/PAS1182/Olivia/NYCHA/cd_hit_est/CD_HIT_EST_NYCHA_1.fasta.bowtie2 -1 /tmp/slurmtmp.37615063/trinity_output37615063/unfixrm_1048_bed_260_317_S183_L006_R1_001.cor.fq.P.qtrim.gz -2 /tmp/slurmtmp.37615063/trinity_output37615063/unfixrm_1048_bed_260_317_S183_L006_R2_001.cor.fq.P.qtrim.gz -p 80 | samtools view -@ 80 -F 4 -S -b | samtools sort -@ 80 -n -o bowtie2.bam 
Error, fewer reads in file specified with -2 than in file specified with -1
(ERR): bowtie2-align died with signal 6 (ABRT) (core dumped)
Error, cmd: set -o pipefail && bowtie2 --no-mixed --no-discordant --gbar 1000 --end-to-end -k 200  -q -X 800 -x /fs/ess/PAS1182/Olivia/NYCHA/cd_hit_est/CD_HIT_EST_NYCHA_1.fasta.bowtie2 -1 /tmp/slurmtmp.37615063/trinity_output37615063/unfixrm_1048_bed_260_317_S183_L006_R1_001.cor.fq.P.qtrim.gz -2 /tmp/slurmtmp.37615063/trinity_output37615063/unfixrm_1048_bed_260_317_S183_L006_R2_001.cor.fq.P.qtrim.gz -p 80 | samtools view -@ 80 -F 4 -S -b | samtools sort -@ 80 -n -o bowtie2.bam  died with ret: 256 at /users/PAS1182/ofarinas621/local/src/trinityrnaseq-v2.15.2/util/align_and_estimate_abundance.pl line 729.

Some other posts have said that it could be mismatched reads, but I don't know how to fix this, given that I already did the trimming steps.

Any help would be greatly appreciated.

Thanks,

Olivia

RNA-seq bowtie2 RSEM Trinity • 538 views

ADD COMMENT • link updated 1 hour ago by GenoMax 154k • written 4 days ago by Olivia • 0

0

Entering edit mode

I have previously run Trinity with the Trimmomatic step as well as CD-HIT-EST.

I assume you did the above to generate the transcriptome, which are then using with bowtie2 after creating an index from it?

ADD REPLY • link 4 days ago by GenoMax 154k

0

Entering edit mode

Yes, I already made the transcriptome with these tools, and the CD-HIT-EST fasta file is the transcript file that I am using for RSEM. Then, I used samtools and bowtie2 to make the index from the CD-HIT-EST fasta file through this step:

module load bowtie2/2.5.1

bowtie2-build CD-HIT-EST.fasta CD-HIT-EST_INDEX.fasta

But, from my understanding, I am supposed to use the original CD-HIT-EST.fasta file (as shown above) as in the input for RSEM and bowtie2, not the index?

ADD REPLY • link 4 days ago by Olivia • 0

0

Entering edit mode

That is good confirmation. Bacl to the error message.

I don't know how to fix this, given that I already did the trimming steps.

DId you scan/trim your fastq data files independently? If so, they could be out of sync. You need to process paired-end data files together when scanning/trimming so that any read that is eliminated from one file needs to have its mate removed from the other file to keep data in sync. You can use repair.sh from BBMap tools to bring your data files back in sync or scan/trim them as a pair to keep data in sync.

ADD REPLY • link 3 days ago by GenoMax 154k

0

Entering edit mode

I am not sure what you mean by this, but I believe that is what I did. With Trinity, I matched all the paired reads together, so they should have been done concurrently.

I just ran these commands below to see if the reads match, which so far, for three pairs of samples, they do not. But what is interesting is that the reverse read of one matches the forward read of different sample. Could this be an issue caused by the sequencing facility or just that they are in fact out of sync?

zcat unfixrm_1048_bed_260_317_S183_L006_R1_001.cor.fq.P.qtrim.gz | wc -l
zcat unfixrm_1048_bed_260_317_S183_L006_R2_001.cor.fq.P.qtrim.gz | wc -l
50368840
49042836

zcat unfixrm_1048_bed_260_317_S35_L007_R1_001.cor.fq.P.qtrim.gz | wc -l
zcat unfixrm_1048_bed_260_317_S35_L007_R2_001.cor.fq.P.qtrim.gz | wc -l
49042836
42063144

zcat unfixrm_1048_hall_252_325_S166_L006_R1_001.cor.fq.P.qtrim.gz | wc -l
zcat unfixrm_1048_hall_252_325_S166_L006_R2_001.cor.fq.P.qtrim.gz | wc -l
42063144
42065588

ADD REPLY • link 3 days ago by Olivia • 0

0

Entering edit mode

Could this be an issue caused by the sequencing facility or just that they are in fact out of sync?

We have no way to know. Did the sequencing facility already scan/trim the data to remove the adapters/extraneous sequence?

Having mismatched reads in the files clearly indicates a problem with the data. If it also includes mislabeling of files that will compound the problem. If you received the data in this form, you should go back and check with the provider.

If you did the trimming yourself, which program did you use (and did you use both paired-end files as input)?

ADD REPLY • link 1 day ago by GenoMax 154k

0

Entering edit mode

Yes, they removed the adapters, so I did not have to do extra steps for this (other than Trimmomatic within Trinity to get rid of short contigs and poor quality reads).

However, I also ran zcat command on the raw/original fastq.gz samples (not the P.qtrim.gz) (see below) and the number of reads are aligning perfectly between forward and reverse, so I imagine whatever issue I am having here is something that happened further downstream.

 zcat 1048_bed_260_317_S183_L006_R1_001.fastq.gz | wc -l 124629748
 zcat 1048_bed_260_317_S183_L006_R2_001.fastq.gz | wc -l 124629748
 zcat 1048_bed_260_317_S35_L007_R1_001.fastq.gz| wc -l 100004704
 zcat 1048_bed_260_317_S35_L007_R2_001.fastq.gz| wc -l 100004704

I am not sure how this would happen at this point in the pipeline, since all of my paired reads were lined up for previous steps, such as Trinity, and the output was the Pqtrim.gz files. Regardless, it sounds like bbtools/bbmap is the solution, even though I am not sure how this works for P.qtrim.gz files or if I can still do this pretty far down the pipeline?

ADD REPLY • link 3 hours ago by Olivia • 0

0

Entering edit mode

Is it possible that mismatched R1/R2 files were used in one of the steps?

If you are sure that has not happened then you could try to see if repair from BBMap (https://bbmap.org/tools/repair ) can sync the files together. It can be done with any paired set of files.

If there is even a reasonable doubt, then start the analysis over with the original files. It is not worth proceeding further with questionable data.

ADD REPLY • link 1 hour ago by GenoMax 154k