I understand that fixed column width data is quite common in bioinformatics. We are currently adding support for fixed column width data files into Easy Data Transform . One of the things we are trying to do is have the software automatically try to detect where the column boundaries are, so you don't have to set them manually. We need some good sample data for that. Is there somewhere I can download some sample fixed column width data files? Or can someone send me some examples? Anything from a few hundred to a few million rows would be good.
Thanks for the link.
I've used awk/sed/grep/perl in the past.
So Easy Data Transform might be a better choice for some people in some situations.
Please don't add answers unless you're answering the principal question. Use
Add Comment
orAdd Reply
instead.If you're going to call yourself a bioinformaticist, you're probably going to be familiar with one or more of those tools. If I have to open a GUI to transform my data, that's going to be a manual process in my pipeline every time. I have to figure out a given awk/sed/grep/cut/paste munge once and I can run it on my 250 samples no problem. I don't see why these transformations have to be visual; pipes and previewing output are both pretty easy.
As for XML/JSON, they are indeed annoying to work with, but there are numerous dedicated, free CLI parsers that handle them just fine.
I suppose you're right about more Excel-like functions, but there is also Knime, which covers most of that functionality if really needed. For most of the fixed formats I linked, there exist other very popular tools to munge them in common ways, e.g. bedtools, bedops, and bcftools.
Regardless, I wish you luck. There may very well be folks who find a GUI for such transformations useful in certain situations.
Pipes work fine if it is a linear pipeline. But a visual layout works better for a graph IMHO. For example blending multiuple input sources and then creating multiple outputs.