Question

Forum:'No code' data manipulation tools for bioinformatics

0

Entering edit mode

3.8 years ago

andy • 0

I recently released a 'no code' data manipulation tool for Windows and Mac. I am from a software engineering/physics background, so I didn't really think about bioinformatics as a possible use for the tool. However I now have several customers using it to manipulate DNA and protein sequences. So it would be interesting to find out a bit more about this application area. A few questions if you don't mind.

Do you use 'no-code tools' for data manipulation, such as joining, filtering, sorting, pivots etc? Or do you prefer a programming approach?

If you do use 'no code' tools, which ones and what do you like and dislike about them?

What file formats do you mainly use for storing and exchanging bioinformatics data (currently I support CSV, TSV, Excel, JSON and XML).

software • 1.4k views

ADD COMMENT • link updated 10 months ago by Ram 43k • written 3.8 years ago by andy • 0

0

Entering edit mode

Excel mangles gene names like Sept3 and Mar2. Smart bioinformaticists will never put anything in to Excel if they can help it for that reason.

ADD REPLY • link 3.8 years ago by swbarnes2 14k

0

Entering edit mode

I am all too familiar with how Excel mangles dates and numbers! ;0)

ADD REPLY • link 3.8 years ago by andy • 0

0

Entering edit mode

Do you use 'no-code tools' for data manipulation, such as joining, filtering, sorting, pivots etc?

Never, not irreproducible and typically does not scale well with large amounts of data in the gigabyte range.

Or do you prefer a programming approach?

Yep, something scripted which does not mess with gene names like Excel and company.

ADD REPLY • link 3.8 years ago by ATpoint 82k

0

Entering edit mode

Excel is a horrible tool for manipulating data. No disagreement there!

How big are your typical datasets in terms of rows x columns (assuming it is tabular data)?

ADD REPLY • link 3.8 years ago by andy • 0

1

Entering edit mode

In fact I do not even know since I never store my single-cell data (sparse matrix formats) as plain text. For other more standard datasets (so raw data are Gigabytes) it is something like 15.000 rows times < 100 columns, for other genomic applications it can also be 150.000 rows times < 100 columns. I personally would never edit any of it using an editor, only one tab or whitespace being messed up can cause issues.

ADD REPLY • link 3.8 years ago by ATpoint 82k

score 1 · Answer 1 · 2020-06-29

1

Entering edit mode

3.8 years ago

jared.andrews07 ★ 16k

A list of common formats can be viewed here along with their specs. Particular emphasis on BED (or BED-like) formats, which are extremely common, in addition to VCF and GTF/GFF. Those (along with typical CSV/TSV) files are likely the most common formats folks may need to actually interact with in a manual nature.

But if your program stores everything in memory, you're gonna have a rough time supporting the bioinformatics market, as files regularly get easily into the gigabyte range. I made a GUI program that used a streaming approach as a pet project when I very first started doing bioinformatics. It worked, no matter the file size, but had some obvious limitations.

ADD COMMENT • link 3.8 years ago by jared.andrews07 ★ 16k

0

Entering edit mode

Thanks for that link. Very useful. I'm surprised how many of the file formats are fixed width/space delimited. That is not something you see a lot of these days. I guess they are more efficient to parse.

Easy Data Transform does currently store everything in memory. That works pretty well for a few millions rows, as long as you aren't on 32 bit Windows (most people have 64 bit Windows now).

ADD REPLY • link 3.8 years ago by andy • 0

0

Entering edit mode

Did you delete the tool post you had created? Is it is windows only tool?

ADD REPLY • link 3.8 years ago by GenoMax 141k

0

Entering edit mode

Did you delete the tool post you had created?

Yes, I thought that it might appear spammy to post twice.

Is it is windows only tool?

There are Windows and Mac versions. One license covers both (up to 3 computers).

ADD REPLY • link 3.8 years ago by andy • 0