Match read length and quality string length
3
0
Entering edit mode
7.9 years ago
Jason ▴ 20

Hi,

I trimmed the first 9 bases of my paired-end RNAseq fastq files and compared the results using FastQC with pre-trimmed *.fastq(s). Now I am trying to align the files using MapSplice but the program terminates in the first check when it is "checking read format". For some particular reads the length of the read string is not equal to length of the quality string. Also, the total lines in fastq / 4 is not 0 which means there is definitely something wrong. I have pasted the output from the program below:

-----------------------------------------------
[Tue Jun 28 12:13:00 2016] Beginning Mapsplice run (MapSplice v2.2.1)
[Tue Jun 28 12:13:00 2016] Bin directory: /mnt/lustre/users/k1338910/MapSplice-v2.2.1/bin/ 
[Tue Jun 28 12:13:00 2016] Preparing output location mapsplice_out/
[Tue Jun 28 12:13:00 2016] Checking files or directory: WTCHG_280303_282_1_trimmed.fastq
[Tue Jun 28 12:13:00 2016] Checking files or directory: WTCHG_280303_282_2_trimmed.fastq
[Tue Jun 28 12:13:00 2016] Checking files or directory: /mnt/lustre/users/k1338910/MapSplice-v2.2.1/ref/mm10/
[Tue Jun 28 12:13:00 2016] Checking Bowtie index files
[Tue Jun 28 12:13:00 2016] Building Bowtie index for reference sequence
[Tue Jun 28 13:47:13 2016] Inspecting Bowtie index files
[Tue Jun 28 13:47:14 2016] Checking reference sequence length
[Tue Jun 28 13:47:23 2016] Checking consistency of Bowtie index and reference sequence
[Tue Jun 28 13:47:23 2016] Checking read format
-----[Read Format: FASTQ]
-----[Read Type: Pair End]
Read length and quality string length not consistent
The 11271068th read in WTCHG_280303_282_1_trimmed.fastq
@K00198:69:HCFFNBBXX:8:1210:21115:35972 1:N:0:TATCTCAG
GATCCAGGGGTACTTCACCTCTACAAAACAAGGCCAAGGGATCCAAACACAGCAAGAGGTACAAGG
+
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
[MapSplice Running Failed]
Error: Checking read format failed

My question is that is there a tool out there which can verify fastq files for such errors and remove the erroneous reads?

Many thanks

Jason

RNA-Seq sequencing fastq • 4.0k views
ADD COMMENT
1
Entering edit mode
7.9 years ago

using awk...

 gunzip -c input.fq.gz  | paste - - - - | awk '(length($2)==length($4))' | tr "\t" "\n"
ADD COMMENT
0
Entering edit mode
7.9 years ago

Rather than removing the broken reads, I recommend you trim with a tool that does not break reads in the first place, such as BBDuk. Also, why are you trimming the first 9 bases? You can certainly do that with BBDuk, but I would highly recommend you NOT do so unless you have a really good reason. FastQC indicating that there is sequence bias for specific bases or kmers at the beginning of the read, for example, is not a good reason to trim - this is a known outcome of certain library-construction methods; the bases are genomic, and should not be trimmed.

ADD COMMENT
0
Entering edit mode
6.1 years ago

I also experienced it, and just fixed up by running the Trimmomatic again.

ADD COMMENT

Login before adding your answer.

Traffic: 2039 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6