Forum: Poll: Does your filesystem support xattr?
gravatar for John
4.2 years ago by
John12k wrote:

Hello all :)

I use Extended Attributes extensively on my data, to keep track of which reference genome the data was mapped to, how it was mapped, the MD5 checksum of the file, bin size, etc etc, and I find it one of those really useful things that doesn't get the attention it deserves.
If you are not familiar with Extended Attributes, they are simply key/value pairs which you can add to your files that, hopefully, move with the data -
For example, on Mac OSX:

"xattr -w mapping mm9 ./mybam.bam"

would store the key 'mapping' with a value of 'mm9' with the file ./mybam.bam

It can be read back with

"xattr -p mapping ./mybam.bam"

BAM files usually have the reference in the header, but for BigWig/BED/etc data this is very convenient. Another very practical application in my work has been to store the MD5 hashsum in the metadata, because our filenames/paths are always changing (!!), or to detect accidental filtering/truncation of data after it is created.
For example, after adding the following two lines to the bashrc on OSX:

writehash() { for file do xattr -w filehash "$(md5 -q "$file")" "$file"; done; }
readhash() { for file do echo -n "$file"' : '; xattr -p filehash "$file"; done; }

Its easy to set the MD5 hash to the file(s) once, and then recall it instantly without having to re-hash the whole multi-gigabyte file(s) so you/your databases dont have to rely on filepaths.
Im sure others can think of some much more creative uses for metadata, and i'd very much like to hear them!!

But, before I am really comfortable releasing code that makes use of metadata, i'm curious to know how many filesystems in Bioinformatic production use actually support it. The compute servers where I work do not, mainly because the file system is NFS which has to have Extended Attributes manually enabled when the file system is formatted.

Thus, I would be very grateful if people could comment with a yes or no, so we could get an idea of how prevalent it is.
Note, xattr is a Mac binary. Check that wikipedia page for your distro's version - typically something like this should work on Linux:

touch somefile
setfattr -n "user.demo" -v "test" somefile
getfattr -n "user.demo" somefile

Thank you!!! :)

forum xattr metadata • 1.5k views
ADD COMMENTlink written 4.2 years ago by John12k

really cool concept. even if it only were to work on a Mac would be useful to a lot of people. Large compute nodes run all kinds of filesytems AFS etc.

ADD REPLYlink written 4.2 years ago by Istvan Albert ♦♦ 80k

Just a note that one can add metadata to (compressed) BED with starch --note "foo bar baz..." and retrieve with unstarch --note, which has the nice feature of being independent of file system. You can put a lot of data in here, like a structured (query-able) and human-readable JSON string.

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by Alex Reynolds28k

Yes, the concept is nice. In my case, I use it sometimes (on Linux) to tag banks files (SRA for instance) with the URL where they come from. I made once a little app that computed stats on the reads (things like min/max length) and tagged the reads file with them; one can then quickly know information about the bank from these tags without to have parsing the bank again.

Actually, even if such tags are not "inside" the file itself, I like to compare them like MP3 tags :)

ADD REPLYlink written 4.2 years ago by edrezen720

Wow I like it! I thought MD5s take a long time to compute, but statistics like pileup-frequencies, coverage, total signal, etc, take orders of magnitude longer - and are frequently re-used in normalization steps, etc. A stats-appending tool for common Bioinformatic filetypes would be very useful :) 
(but only if people can actually use Extended Metadata)

Maybe im thinking about this wrong - maybe the 'if you build it, they will come' philosophy would be better suited here...

ADD REPLYlink written 4.2 years ago by John12k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 619 users visited in the last hour