Question: Problem with STAR aligner
0
gravatar for h1midhat
6 weeks ago by
h1midhat0
h1midhat0 wrote:

Dear all, I am relatively new to RNAseq analysis so I am really hoping someone can help me with this issue. I am using STAR aligner for mapping my paired-end RNAseq reads. For the first 10 samples, this worked seamlessly but for the last 2, I keep getting the following error message:

> *EXITING because of FATAL ERROR in reads input: quality string length is not equal to sequence length
> @GWNJ-0957:375:GW1902221898:2:2211:26494:23442
> CGAAGACAGACCGAAGATGATCCAAGTAGCTAAGGAACTCAAGCGGATTGAAACATCACTGAAAGGTGTTAGCTAAATACCTCTTCTCTGTTCTTGGACTG
> AAAFFJJFFJJFFF-AAFAAAJFJFFFJJF<<AJ- SOLUTION: fix your fastq file*

The sequence length is 100bps and I am doing the alignment on trimmed reads. I would really appreciate any help you can provide to fix this issue. Many thanks, Midhat

rna-seq star • 140 views
ADD COMMENTlink modified 6 weeks ago by swbarnes25.2k • written 6 weeks ago by h1midhat0
1

quality string length is not equal to sequence length

Looks like you may have mangled fastq record(s) in some way. What pre-processing did you do with these files?

ADD REPLYlink written 6 weeks ago by genomax65k

The fastq files had adapter contamination so I trimmed these using TRIMMOMATIC 'CROP:101 before using them for alignment. The other reads that STAR successfully aligned were trimmed the exact same way.

ADD REPLYlink written 6 weeks ago by h1midhat0

Inspect the read using this command

zgrep -A 3 '@GWNJ-0957:375:GW1902221898:2:2211:26494:23442' file.fq.gz (or plain grep if files are not compressed) to see if the record is indeed malformed.

I am not sure what TRIMMOMATIC 'CROP:101 did, if your reads are only 100 bp long.

ADD REPLYlink written 6 weeks ago by genomax65k

I'll check the files as per your suggestion. The reads were originally 150bps. They were about 100 bps after cropping using TRIMMOMATIC

ADD REPLYlink written 6 weeks ago by h1midhat0

Unless you had a reason to do so you may have thrown away 33% good data by doing a hard crop like that.

ADD REPLYlink written 6 weeks ago by genomax65k
1
gravatar for swbarnes2
6 weeks ago by
swbarnes25.2k
United States
swbarnes25.2k wrote:

A few points: STAR's genomic alignment is very good about making an alignment work, even if the ends of the read don't match. (the alignment to transcriptome that it can produce is however very strict about throwing away reads that require clipping to align)

The lab running the libraries is spending a lot of money to get reads that are 150 bases long...If adapter contamination is so bad that you find you have to trim 50 bases off of all the reads, you should tell the lab they are wasting their money sequencing so much. A 75 cycle kit might get them almost as much information, and be much cheaper.

But if you have adapter contamination, you'd be better off trimming the adapters away specifically, not cropping every read an arbitrary amount.

ADD COMMENTlink written 6 weeks ago by swbarnes25.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2061 users visited in the last hour