Remove quotes in fastq files
2
1
Entering edit mode
2.2 years ago
Philipp ▴ 10

Hi, I'm new in this area, so thanks a lot for any help in advance.

I have some fastq files, in which in some lines there are additional quotes " " added to the quality score in the beginning and the end sometimes and I want to remove them now.

For example:

    @NGSNJ-086:647:GW2112051649th:1:1101:6506:1016 1:N:0:CTGAAGCT+ATAGCCTT
AAACTAAGTCAATTCTAATACGACTCACTATAGGAGCTCAGCCTTCACTGCTTCTTAAAGATGCGCACACAACACTCTTTACGTATGTACCGGCACCACGGTCGGATCCTAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCTGAAG
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@NGSNJ-086:647:GW2112051649th:1:1101:7428:1078 1:N:0:CTGAAGCT+ATAGCCTT
AAACTAAGTCAATTCTAATACGACTCACTATAGGAGCTCAGCCTTCACTGCGACAAAATTGGCCATCTTTCCGACAAACAACATGCCCCACGGCACCACGGTCGGATCCTAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCTGAAG
+
"FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF"

So I want to remove the " " in the last line, is there any efficient way to do this, thanks a lot

fastq • 1.5k views
ADD COMMENT
0
Entering edit mode

How did this come to pass? These should not even be in there.

ADD REPLY
0
Entering edit mode

I did some preprocessing in R and the writeFastq command added them because R prints strings with quotes. But I want to avoid doing the calculations again if the quotes can be removed easily

ADD REPLY
1
Entering edit mode

Which package is that function from? No sane bioinformatician would write a FASTQ with quotes.

ADD REPLY
0
Entering edit mode

It was the writeFastq function from the microseq package. Yes, I didn't expect that either. Also strange that it only occurred in some lines.

ADD REPLY
1
Entering edit mode

Sounds either like a crap package or something else being wrong. I would start over from scratch. The double quote is a valid quality value so simply gsubbing it away may/will break the fastqs as well.

ADD REPLY
1
Entering edit mode

The package author fixed this in Apr 2021: https://github.com/larssnip/microseq/commit/b5f1c824605290c6b60df402b1e5de31e242811e

It looks like OP is using an older version of the package. The fact that this issue even existed speaks to how untested the package was when submitted to CRAN.

ADD REPLY
0
Entering edit mode

Thanks, I must check why such an old version is installed / was not updated

ADD REPLY
0
Entering edit mode

Thanks for the suggestion, I guess I will try sed -i 's/^"//'

sed -i 's/"$//' test.fastq

and if that does not work I could do it in R (conditioned that it won't happen again when writing) by checking the length, doing it from scratch would take some days

Do you know a safe writeFastq function?

ADD REPLY
1
Entering edit mode

Use cpad's 0~4 line-step sed instead of your all-line sed.

ADD REPLY
3
Entering edit mode
2.2 years ago
$ sed '0~4 s/"//g' test.fq
ADD COMMENT
0
Entering edit mode

Thanks a lot

ADD REPLY
0
Entering edit mode

This will delete all quotes in each fourth line, won't it?

ADD REPLY
1
Entering edit mode

yes, If you want to be further careful, try this:

$ sed -r '0~4 s/^"//;s/"$//' test.fq
ADD REPLY
1
Entering edit mode
2.2 years ago

Generally in R you work with fasta and fastq files as a Biostrings object. If you have a biostrings object of your fastq file writeXStringSet would be the correct function to save it as a file.

ADD COMMENT
0
Entering edit mode

Thanks, I was using Biostrings anyway, I just wasn't aware they have read/write functionaliy

ADD REPLY

Login before adding your answer.

Traffic: 2884 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6