remove white space in fastq file
0
0
Entering edit mode
4 months ago
gkarere • 0

please help remove white space in fastq file

fastq • 1.1k views
ADD COMMENT
2
Entering edit mode
ADD REPLY
1
Entering edit mode

Note: FASTQ headers might need those spaces, DO NOT REMOVE them unless you fully know what you're doing and why you're doing it.

ADD REPLY
0
Entering edit mode

please help remove white space in fastq file in mac 12.6.9

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

I am running mapper. pl on mirDeep2-0.1.0 and some sequence files are kicked out with these messages line " Second line of FASTQ reads file contains whitespace in sequence". "Please make sure your file is in accordance with the FASTQ format specifications"

ADD REPLY
0
Entering edit mode

Why did you add this as an answer? Please learn how to use the forum properly. Read these posts: https://www.biostars.org/tag/how-to/

ADD REPLY
0
Entering edit mode

FASTQ should NOT have white spaces in the read sequence line. Do not just remove them, go back to the source and figure out why they have white spaces.

ADD REPLY
0
Entering edit mode

Do you mean empty lines between sequences? Or spaces within the sequences?

To remove empty lines between sequences I've used sed before:

sed -i '/^$/d' your_fastq_file.fastq
ADD REPLY
0
Entering edit mode

No it is spaces within sequences so this answer does not apply.

I am running mapper. pl on mirDeep2-0.1.0 and some sequence files are kicked out with these messages line " Second line of FASTQ reads file contains whitespace in sequence". "

ADD REPLY
0
Entering edit mode

I've moved your post to a comment as it does not answer the top level question.

ADD REPLY
0
Entering edit mode

Not been able to remove white spaces. I would appreciate any other suggestion

ADD REPLY
0
Entering edit mode

Try the following

awk 'NR%4==2 {gsub(" ", "");} {print;}' input.fastq > output.fastq

Replace the names of the files as appropriate.


ChatGPT also suggests this python script

def remove_spaces_from_fastq(input_file, output_file):
    with open(input_file, 'r') as infile, open(output_file, 'w') as outfile:
        for line_number, line in enumerate(infile, start=1):
            # Modify the second line (sequence line) by removing spaces
            if line_number % 4 == 2:
                line = line.replace(" ", "")

            # Write the modified or unmodified line to the output file
            outfile.write(line)

if __name__ == "__main__":
    input_fastq = "input.fastq"  # Replace with your input FASTQ file
    output_fastq = "output.fastq"  # Replace with the desired output file name

    remove_spaces_from_fastq(input_fastq, output_fastq)

Save as remove_spaces.py and run.

python remove_spaces.py

NOTE: Pay attention to point @Ram has made a couple of times. If the data is corrupt you are best off procuring a new copy.

ADD REPLY
1
Entering edit mode

ChatGPT hard-coding inputs - looks like it's gone from being a good programmer to being a beginner. Weird AI evolution?

ADD REPLY
0
Entering edit mode

Did you figure out why those white spaces were introduced? Your data is possibly corrupt and you're working to silence the evidence instead of preserving it.

ADD REPLY

Login before adding your answer.

Traffic: 1671 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6