Problem with STAR aligner
1
0
Entering edit mode
5.1 years ago
h1midhat • 0

Dear all, I am relatively new to RNAseq analysis so I am really hoping someone can help me with this issue. I am using STAR aligner for mapping my paired-end RNAseq reads. For the first 10 samples, this worked seamlessly but for the last 2, I keep getting the following error message:

> *EXITING because of FATAL ERROR in reads input: quality string length is not equal to sequence length
> @GWNJ-0957:375:GW1902221898:2:2211:26494:23442
> CGAAGACAGACCGAAGATGATCCAAGTAGCTAAGGAACTCAAGCGGATTGAAACATCACTGAAAGGTGTTAGCTAAATACCTCTTCTCTGTTCTTGGACTG
> AAAFFJJFFJJFFF-AAFAAAJFJFFFJJF<<AJ- SOLUTION: fix your fastq file*

The sequence length is 100bps and I am doing the alignment on trimmed reads. I would really appreciate any help you can provide to fix this issue. Many thanks, Midhat

RNA-Seq STAR • 4.2k views
ADD COMMENT
1
Entering edit mode

quality string length is not equal to sequence length

Looks like you may have mangled fastq record(s) in some way. What pre-processing did you do with these files?

ADD REPLY
0
Entering edit mode

The fastq files had adapter contamination so I trimmed these using TRIMMOMATIC 'CROP:101 before using them for alignment. The other reads that STAR successfully aligned were trimmed the exact same way.

ADD REPLY
1
Entering edit mode

Inspect the read using this command

zgrep -A 3 '@GWNJ-0957:375:GW1902221898:2:2211:26494:23442' file.fq.gz (or plain grep if files are not compressed) to see if the record is indeed malformed.

I am not sure what TRIMMOMATIC 'CROP:101 did, if your reads are only 100 bp long.

ADD REPLY
0
Entering edit mode

I'll check the files as per your suggestion. The reads were originally 150bps. They were about 100 bps after cropping using TRIMMOMATIC

ADD REPLY
1
Entering edit mode

Unless you had a reason to do so you may have thrown away 33% good data by doing a hard crop like that.

ADD REPLY
2
Entering edit mode
5.1 years ago

A few points: STAR's genomic alignment is very good about making an alignment work, even if the ends of the read don't match. (the alignment to transcriptome that it can produce is however very strict about throwing away reads that require clipping to align)

The lab running the libraries is spending a lot of money to get reads that are 150 bases long...If adapter contamination is so bad that you find you have to trim 50 bases off of all the reads, you should tell the lab they are wasting their money sequencing so much. A 75 cycle kit might get them almost as much information, and be much cheaper.

But if you have adapter contamination, you'd be better off trimming the adapters away specifically, not cropping every read an arbitrary amount.

ADD COMMENT

Login before adding your answer.

Traffic: 1855 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6