Question

Match read length and quality string length

0

Entering edit mode

7.9 years ago

Jason ▴ 20

Hi,

I trimmed the first 9 bases of my paired-end RNAseq fastq files and compared the results using FastQC with pre-trimmed *.fastq(s). Now I am trying to align the files using MapSplice but the program terminates in the first check when it is "checking read format". For some particular reads the length of the read string is not equal to length of the quality string. Also, the total lines in fastq / 4 is not 0 which means there is definitely something wrong. I have pasted the output from the program below:

-----------------------------------------------
[Tue Jun 28 12:13:00 2016] Beginning Mapsplice run (MapSplice v2.2.1)
[Tue Jun 28 12:13:00 2016] Bin directory: /mnt/lustre/users/k1338910/MapSplice-v2.2.1/bin/ 
[Tue Jun 28 12:13:00 2016] Preparing output location mapsplice_out/
[Tue Jun 28 12:13:00 2016] Checking files or directory: WTCHG_280303_282_1_trimmed.fastq
[Tue Jun 28 12:13:00 2016] Checking files or directory: WTCHG_280303_282_2_trimmed.fastq
[Tue Jun 28 12:13:00 2016] Checking files or directory: /mnt/lustre/users/k1338910/MapSplice-v2.2.1/ref/mm10/
[Tue Jun 28 12:13:00 2016] Checking Bowtie index files
[Tue Jun 28 12:13:00 2016] Building Bowtie index for reference sequence
[Tue Jun 28 13:47:13 2016] Inspecting Bowtie index files
[Tue Jun 28 13:47:14 2016] Checking reference sequence length
[Tue Jun 28 13:47:23 2016] Checking consistency of Bowtie index and reference sequence
[Tue Jun 28 13:47:23 2016] Checking read format
-----[Read Format: FASTQ]
-----[Read Type: Pair End]
Read length and quality string length not consistent
The 11271068th read in WTCHG_280303_282_1_trimmed.fastq
@K00198:69:HCFFNBBXX:8:1210:21115:35972 1:N:0:TATCTCAG
GATCCAGGGGTACTTCACCTCTACAAAACAAGGCCAAGGGATCCAAACACAGCAAGAGGTACAAGG
+
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
[MapSplice Running Failed]
Error: Checking read format failed

My question is that is there a tool out there which can verify fastq files for such errors and remove the erroneous reads?

Many thanks

Jason

RNA-Seq sequencing fastq • 4.0k views

ADD COMMENT • link updated 6.1 years ago by peterhuang1108 • 0 • written 7.9 years ago by Jason ▴ 20

score 1 · Answer 1 · 2016-06-28

1

Entering edit mode

7.9 years ago

Pierre Lindenbaum 161k

using awk...

 gunzip -c input.fq.gz  | paste - - - - | awk '(length($2)==length($4))' | tr "\t" "\n"

ADD COMMENT • link 7.9 years ago by Pierre Lindenbaum 161k

score 0 · Answer 2 · 2016-06-28

Rather than removing the broken reads, I recommend you trim with a tool that does not break reads in the first place, such as BBDuk. Also, why are you trimming the first 9 bases? You can certainly do that with BBDuk, but I would highly recommend you NOT do so unless you have a really good reason. FastQC indicating that there is sequence bias for specific bases or kmers at the beginning of the read, for example, is not a good reason to trim - this is a known outcome of certain library-construction methods; the bases are genomic, and should not be trimmed.

score 0 · Answer 3 · 2018-04-16

0

Entering edit mode

6.1 years ago

peterhuang1108 • 0

I also experienced it, and just fixed up by running the Trimmomatic again.

ADD COMMENT • link 6.1 years ago by peterhuang1108 • 0