Forum: 'No code' data manipulation tools for bioinformatics
0
gravatar for andy
2 days ago by
andy0
andy0 wrote:

I recently released a 'no code' data manipulation tool for Windows and Mac. I am from a software engineering/physics background, so I didn't really think about bioinformatics as a possible use for the tool. However I now have several customers using it to manipulate DNA and protein sequences. So it would be interesting to find out a bit more about this application area. A few questions if you don't mind.

Do you use 'no-code tools' for data manipulation, such as joining, filtering, sorting, pivots etc? Or do you prefer a programming approach?

If you do use 'no code' tools, which ones and what do you like and dislike about them?

What file formats do you mainly use for storing and exchanging bioinformatics data (currently I support CSV, TSV, Excel, JSON and XML).

forum tools software • 102 views
ADD COMMENTlink modified 2 days ago by jared.andrews076.1k • written 2 days ago by andy0

Excel mangles gene names like Sept3 and Mar2. Smart bioinformaticists will never put anything in to Excel if they can help it for that reason.

ADD REPLYlink written 2 days ago by swbarnes27.8k

I am all too familiar with how Excel mangles dates and numbers! ;0)

ADD REPLYlink written 2 days ago by andy0

Do you use 'no-code tools' for data manipulation, such as joining, filtering, sorting, pivots etc?

Never, not irreproducible and typically does not scale well with large amounts of data in the gigabyte range.

Or do you prefer a programming approach?

Yep, something scripted which does not mess with gene names like Excel and company.

ADD REPLYlink modified 2 days ago • written 2 days ago by ATpoint36k

Excel is a horrible tool for manipulating data. No disagreement there!

How big are your typical datasets in terms of rows x columns (assuming it is tabular data)?

ADD REPLYlink written 2 days ago by andy0
1

In fact I do not even know since I never store my single-cell data (sparse matrix formats) as plain text. For other more standard datasets (so raw data are Gigabytes) it is something like 15.000 rows times < 100 columns, for other genomic applications it can also be 150.000 rows times < 100 columns. I personally would never edit any of it using an editor, only one tab or whitespace being messed up can cause issues.

ADD REPLYlink written 2 days ago by ATpoint36k
1
gravatar for jared.andrews07
2 days ago by
jared.andrews076.1k
St. Louis, MO
jared.andrews076.1k wrote:

A list of common formats can be viewed here along with their specs. Particular emphasis on BED (or BED-like) formats, which are extremely common, in addition to VCF and GTF/GFF. Those (along with typical CSV/TSV) files are likely the most common formats folks may need to actually interact with in a manual nature.

But if your program stores everything in memory, you're gonna have a rough time supporting the bioinformatics market, as files regularly get easily into the gigabyte range. I made a GUI program that used a streaming approach as a pet project when I very first started doing bioinformatics. It worked, no matter the file size, but had some obvious limitations.

ADD COMMENTlink written 2 days ago by jared.andrews076.1k

Thanks for that link. Very useful. I'm surprised how many of the file formats are fixed width/space delimited. That is not something you see a lot of these days. I guess they are more efficient to parse.

Easy Data Transform does currently store everything in memory. That works pretty well for a few millions rows, as long as you aren't on 32 bit Windows (most people have 64 bit Windows now).

ADD REPLYlink written 2 days ago by andy0

Did you delete the tool post you had created? Is it is windows only tool?

ADD REPLYlink written 2 days ago by genomax85k

Did you delete the tool post you had created?

Yes, I thought that it might appear spammy to post twice.

Is it is windows only tool?

There are Windows and Mac versions. One license covers both (up to 3 computers).

ADD REPLYlink written 2 days ago by andy0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1289 users visited in the last hour