Question: Match read length and quality string length
0
gravatar for Jason
2.0 years ago by
Jason20
Jason20 wrote:

Hi,

I trimmed the first 9 bases of my paired-end RNAseq fastq files and compared the results using FastQC with pre-trimmed *.fastq(s). Now I am trying to align the files using MapSplice but the program terminates in the first check when it is "checking read format". For some particular reads the length of the read string is not equal to length of the quality string. Also, the total lines in fastq / 4 is not 0 which means there is definitely something wrong. I have pasted the output from the program below:

-----------------------------------------------
[Tue Jun 28 12:13:00 2016] Beginning Mapsplice run (MapSplice v2.2.1)
[Tue Jun 28 12:13:00 2016] Bin directory: /mnt/lustre/users/k1338910/MapSplice-v2.2.1/bin/ 
[Tue Jun 28 12:13:00 2016] Preparing output location mapsplice_out/
[Tue Jun 28 12:13:00 2016] Checking files or directory: WTCHG_280303_282_1_trimmed.fastq
[Tue Jun 28 12:13:00 2016] Checking files or directory: WTCHG_280303_282_2_trimmed.fastq
[Tue Jun 28 12:13:00 2016] Checking files or directory: /mnt/lustre/users/k1338910/MapSplice-v2.2.1/ref/mm10/
[Tue Jun 28 12:13:00 2016] Checking Bowtie index files
[Tue Jun 28 12:13:00 2016] Building Bowtie index for reference sequence
[Tue Jun 28 13:47:13 2016] Inspecting Bowtie index files
[Tue Jun 28 13:47:14 2016] Checking reference sequence length
[Tue Jun 28 13:47:23 2016] Checking consistency of Bowtie index and reference sequence
[Tue Jun 28 13:47:23 2016] Checking read format
-----[Read Format: FASTQ]
-----[Read Type: Pair End]
Read length and quality string length not consistent
The 11271068th read in WTCHG_280303_282_1_trimmed.fastq
@K00198:69:HCFFNBBXX:8:1210:21115:35972 1:N:0:TATCTCAG
GATCCAGGGGTACTTCACCTCTACAAAACAAGGCCAAGGGATCCAAACACAGCAAGAGGTACAAGG
+
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
[MapSplice Running Failed]
Error: Checking read format failed

My question is that is there a tool out there which can verify fastq files for such errors and remove the erroneous reads?

Many thanks

Jason

sequencing rna-seq fastq • 1.0k views
ADD COMMENTlink modified 3 months ago by peterhuang11080 • written 2.0 years ago by Jason20
1
gravatar for Pierre Lindenbaum
2.0 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum109k wrote:

using awk...

 gunzip -c input.fq.gz  | paste - - - - | awk '(length($2)==length($4))' | tr "\t" "\n"
ADD COMMENTlink written 2.0 years ago by Pierre Lindenbaum109k
0
gravatar for Brian Bushnell
2.0 years ago by
Walnut Creek, USA
Brian Bushnell15k wrote:

Rather than removing the broken reads, I recommend you trim with a tool that does not break reads in the first place, such as BBDuk. Also, why are you trimming the first 9 bases? You can certainly do that with BBDuk, but I would highly recommend you NOT do so unless you have a really good reason. FastQC indicating that there is sequence bias for specific bases or kmers at the beginning of the read, for example, is not a good reason to trim - this is a known outcome of certain library-construction methods; the bases are genomic, and should not be trimmed.

ADD COMMENTlink written 2.0 years ago by Brian Bushnell15k
0
gravatar for peterhuang1108
3 months ago by
peterhuang11080 wrote:

I also experienced it, and just fixed up by running the Trimmomatic again.

ADD COMMENTlink written 3 months ago by peterhuang11080
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1016 users visited in the last hour