Entering edit mode
4.6 years ago
olechnwin ▴ 60
I'm so confused. I can't figure out how to use faSplit to split my fasta file into 2 files. From the documentation, it seems I can do this command:
~/opt/faSplit sequence 1SQ_reads.fasta 2 1SQ_reads_
but, this generates files 1SQ_reads_0.fa, 1SQ_reads_1.fa 1SQ_reads_2.fa, and so on...
what did I do wrong? How do I split my fasta file into several files?
faSplityou're using is the right faSplit. Run a
man faSplitto check the version as well as the usage document.
faSplitbinary can be found here: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/
Thanks for your reply. Tried
man faSplitit came back with the 'No manual entry for faSplit' I thought that's where I downloaded it from. I'll try to re-download.
type faSplit and press enter button without any input. It would print help. Copy/pasted from help:
Yes. I get that. I was wondering if the ability to print manual is because of the newer version.
What is the latest version? The one I have is this one:
This seems to be the latest version on bioconda. I installed that version and I see the help text when I run
fasplitwithout arguments. The
faSplitbinary from UCSC doesn't work for me on macOS but works fine on GNU Linux.
Thanks for checking the version. So, the version on bioconda
faSplit sequencedoes not split fasta into desired number of files. So instead I divided the size of my original file and use
faSplit aboutto split by size and get approximately the number of files I wanted.
I checked on my computer, it works fine. It does not split them into equally sized files, but it does split them into as many files as requested. My commands:
I have absolutely no idea why it doesn't work on mine. My commands:
and many more test_.fa files.
Maybe check with your sysadmin on this? Can you also post output of
Can you try with the fasta file I used and run the same commands and see if the output is different? I just wanna make sure your FASTA identifiers are not messing with the program (they shouldn't be, but just in case)
@RamRS, I tried to use your fasta file and it worked!
So, my FASTA identifiers are messing with the program? FYI, my fasta file was from pacbio. Should I be concerned with running faSplit on my fasta files then?
Not sure if it's the identifiers, why don't you try:
This doesn't work either:
Just to make sure that the previous run did not affect this one, you did
rm -rf testbefore running the
faSplitcommand, right? How big is your
Yes. I removed the test folder before running the faSplit with sed. My 1SQ_reads.fasta file is about 40 GB.
You should not use
faSplit sequencethen, it seems to work in a fashion that doesn't really make sense. In your case it would, if it worked as it should, produce two files where one is a few kB and the other almost 40GB. Maybe try
faSplit aboutand copy over a few lines if it breaks halfway through an entry?
hmm....thanks for the hint about checking the files I did
faSplit aboutwhen I realized
faSplit sequencedoes not work. But, upon checking the result, it seems that although
faSplit aboutseems to be working properly, it didn't!
Update: made a mistake. Seems to be working. At least for the ones I checked.
The beginning of file 2 is this:
Searching for this line in original file:
Printing the previous line from original file:
It does match! At least for the ones I checked.
But, now I'm wary with using faSplit to split fasta file from pacbio.
did you try faSplit binary from http://hgdownload.soe.ucsc.edu/admin/exe/macOSX.x86_64/ ? @ RamRS
I don't think I tried that, but on macOS I used the one from bioconda and it works fine.
The faSplit binary does not work for me since it was built on a more recent OS than the one I currently has.
I downloaded the binary from https://github.com/ENCODE-DCC/kentUtils/tree/master/bin/linux.x86_64 and is working as expected (faSplit base)
function (split test.fa to 2 files):