How to divide FASTA file?
2
0
Entering edit mode
6.7 years ago
l.souza ▴ 80

Hello!

I have a FASTA file with about 1600 sequences. But, I'm gonna use a tool that requires the FASTA files possess up to 200 sequences.

Is there a way to automate the division of this file using Linux or Windows?

Thanks in advance.

fasta sequences • 3.8k views
ADD COMMENT
1
Entering edit mode
6.7 years ago

If your FASTA file records are linear — one line for the header, one line for sequence — you can use split -l:

$ split -l 400 records.fa splitRecords_

200 two-line records will take up 400 lines. Split filenames will start with the prefix splitRecords_.

If you have multiline FASTA, then you can convert it to linear FASTA. Search Biostars on how to do this conversion step. Once converted to linear form, you can use split.

ADD COMMENT
1
Entering edit mode
6.7 years ago
GenoMax 146k

faSplit utility from Jim Kent at UCSC. Download and make executable (chmod a+x faSplit). Linux version linked, macOS available.

faSplit - Split an fa file into several files.
usage:
   faSplit how input.fa count outRoot
where how is either 'about' 'byname' 'base' 'gap' 'sequence' or 'size'.  
Files split by sequence will be broken at the nearest fa record boundary. 
Files split by base will be broken at any base.  
Files broken by size will be broken every count bases.

Examples:
   faSplit sequence estAll.fa 100 est
This will break up estAll.fa into 100 files
(numbered est001.fa est002.fa, ... est100.fa
Files will only be broken at fa record boundaries

and many other modes you can check by running faSplit.

ADD COMMENT

Login before adding your answer.

Traffic: 1740 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6