Want to keep track of changes in your massive dataset? As far as speed, we used it to replace our propriatary data format. Might one take a peek at that? DiThi on Jan 8, Damn, you’re right These systems are for actual big data sets where you have several terabytes to several petabytes.

Uploader: Gogor
Date Added: 13 October 2011
File Size: 13.13 Mb
Operating Systems: Windows NT/2000/XP/2003/2003/7/8/10 MacOS 10/X
Downloads: 64959
Price: Free* [*Free Regsitration Required]

You can easily see where this can go bad; you will have to take extraordinary care. Its a complex file format with a lot hvf5 in memory structures.

hdf-forum – HDF5 file as a binary stream ?

GC was offline though; it was performed at the time the container was “upgraded” from RO to RW access. Rather you’d had something like a server that retrieves complete HDF5 files via socket and make them available to other clients, like on some NFS or SSHFS where the file written is closed after each operation, and thereby “published” for other clients to read the same file?

When in the field and the system shits the bed, its really easy to open an HDF5 file in HDFview and inspect the file contents. I’m in no way an apologist for HDF5 but his complaints are terribly vague.

You can write to a stream or read from it, and if you include an HTTP header in this stream – which is supported by the most recent version of the streaming VFD – then it easily becomes its standalone webserver that produces HDF5 files on demand in memory, just for streaming data, never a physical file created anywhere. It is right that converting endianness is faster than reading text, but it still requires more code than calling printf and scanf.


I ended up designing a transactional file format a kind of log-structured file system with data versioning and garbage collection, all stored in a single file from scratch to match our requirements.

HDF5 file as a binary stream ?

We can get by just fine with: These are the war-stories that I love to hear. If you want headers, you can define those as text and just use extents to embed the binary data.

We don’t need headers, structured arrays, or any weird esoteric object types. They strwam with a lot less baggage than say HDF5. Your comment is true for your use case, but it’s not really responsive to the technical streqm here. Attributes are accessed through the attrs proxy object, which again implements the dictionary interface: Dylan on Jan 8, This opens an HDF5 file. Want to keep track of changes in your massive dataset?

Moving away from HDF5 | Hacker News

Hacker News new comments show ask jobs submit. Contact ShadesOfGray for current posts and information. A common pattern that my scientific software used was this: An HDF5 file is a container for two kinds of objects: If it’s tabular, SQLite still can’t be beat.


The pattern is to create a hef5 snapshot of directory. I used H5Gopen to detect whether a group already exists, then use H5Gcreate to create a new one dhf5 not.

If the “parallel access” refers to threading, HDF has a thread safe feature that hdf55 need to enable when building the code.

I liked the post well, as an HDF5 user, I found it depressing Funny enough, we started out with a file-based system like the one you described, and moved to an HDF5-based system.

In other words was it open sourced or plans to that effect exists? This account is no longer active. Which one are you using? This version has a known bug which prevents hdd5 from working in some cases with newer versions of HDF5. This example is working on gigabytes rather than terabytes of data, but memory utilization is basically limited to just buffers, so you could definitely scale that to terabytes without a problem.

As for simplicity, people start wanting metadata and headers and this and that, and before you know it you need HDF5 or ROOT again and it’s no longer simple. Some people want stuff in R, some in Python 2. To create this stram, read Appendix: