Splitting Individual FASTA/FASTQ reads from NGS data
1
0
Entering edit mode
8.3 years ago
NGS-Newbie ▴ 20

Hi All

I have subsets of 100 and 500 reads in FASTA and FASTQ formats. How can I split this one FASTA/FASTQ file with 100 reads into 100 FASTA files containing one sequence read each?

Thank you all!

Sorting Individual FASTA/FASTQ Reads • 3.1k views
ADD COMMENT
1
Entering edit mode

It's very likely that what you are looking for already exists, but rolling your own code (for example in Python) would be trivial. I guess it would take me longer to search the internet for something then just write it myself. Let me know if you need help with that (but for your own good it's best if you try first on your own to get something working...)

ADD REPLY
0
Entering edit mode

Although similar, FASTA and FASTQ are different file formats. FASTQ contains base quality information in addition the sequence information. If you're splitting a FASTQ into many FASTA, you will be discarding sequence quality information. Is this really what you want to do?

ADD REPLY
2
Entering edit mode
8.3 years ago
GenoMax 153k

faSplit (linux version linked/ macOS available) from Kent Utilities will take care of the fasta file split.

Instead of "sorting" you may want to change the title to "splitting".

For fastq files you could do: split -l 4 -d -a 500 your_file.fq SEQ. Use a different word instead of SEQ to use that as file name PREFIX.

ADD COMMENT
0
Entering edit mode

Thanks, genomax!

Which program do I need to install to run this? FaSplit?

ADD REPLY
1
Entering edit mode

Nothing to install with faSplit. Download the file I linked (add execute permissions if needed) and run.

ADD REPLY
0
Entering edit mode

Awesome! Thanks, GenoMax!

I just modified it a bit, as

split -l 2 -a 15 File.fa S1Seq

Thank you all!

ADD REPLY

Login before adding your answer.

Traffic: 4586 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6