Forum:'No code' data manipulation tools for bioinformatics
1
0
Entering edit mode
3.8 years ago
andy • 0

I recently released a 'no code' data manipulation tool for Windows and Mac. I am from a software engineering/physics background, so I didn't really think about bioinformatics as a possible use for the tool. However I now have several customers using it to manipulate DNA and protein sequences. So it would be interesting to find out a bit more about this application area. A few questions if you don't mind.

Do you use 'no-code tools' for data manipulation, such as joining, filtering, sorting, pivots etc? Or do you prefer a programming approach?

If you do use 'no code' tools, which ones and what do you like and dislike about them?

What file formats do you mainly use for storing and exchanging bioinformatics data (currently I support CSV, TSV, Excel, JSON and XML).

software • 1.4k views
ADD COMMENT
0
Entering edit mode

Excel mangles gene names like Sept3 and Mar2. Smart bioinformaticists will never put anything in to Excel if they can help it for that reason.

ADD REPLY
0
Entering edit mode

I am all too familiar with how Excel mangles dates and numbers! ;0)

ADD REPLY
0
Entering edit mode

Do you use 'no-code tools' for data manipulation, such as joining, filtering, sorting, pivots etc?

Never, not irreproducible and typically does not scale well with large amounts of data in the gigabyte range.

Or do you prefer a programming approach?

Yep, something scripted which does not mess with gene names like Excel and company.

ADD REPLY
0
Entering edit mode

Excel is a horrible tool for manipulating data. No disagreement there!

How big are your typical datasets in terms of rows x columns (assuming it is tabular data)?

ADD REPLY
1
Entering edit mode

In fact I do not even know since I never store my single-cell data (sparse matrix formats) as plain text. For other more standard datasets (so raw data are Gigabytes) it is something like 15.000 rows times < 100 columns, for other genomic applications it can also be 150.000 rows times < 100 columns. I personally would never edit any of it using an editor, only one tab or whitespace being messed up can cause issues.

ADD REPLY
1
Entering edit mode
3.8 years ago

A list of common formats can be viewed here along with their specs. Particular emphasis on BED (or BED-like) formats, which are extremely common, in addition to VCF and GTF/GFF. Those (along with typical CSV/TSV) files are likely the most common formats folks may need to actually interact with in a manual nature.

But if your program stores everything in memory, you're gonna have a rough time supporting the bioinformatics market, as files regularly get easily into the gigabyte range. I made a GUI program that used a streaming approach as a pet project when I very first started doing bioinformatics. It worked, no matter the file size, but had some obvious limitations.

ADD COMMENT
0
Entering edit mode

Thanks for that link. Very useful. I'm surprised how many of the file formats are fixed width/space delimited. That is not something you see a lot of these days. I guess they are more efficient to parse.

Easy Data Transform does currently store everything in memory. That works pretty well for a few millions rows, as long as you aren't on 32 bit Windows (most people have 64 bit Windows now).

ADD REPLY
0
Entering edit mode

Did you delete the tool post you had created? Is it is windows only tool?

ADD REPLY
0
Entering edit mode

Did you delete the tool post you had created?

Yes, I thought that it might appear spammy to post twice.

Is it is windows only tool?

There are Windows and Mac versions. One license covers both (up to 3 computers).

ADD REPLY

Login before adding your answer.

Traffic: 2525 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6