The File System Storage System (FSS) plays a critical role in Socorro as a crash storage buffer standing between the collectors and HBase. Because of the potential instability of HBase connections and our mandate to never lose a crash, the collectors write to our very reliable FSS. Once crashes are safely ensconced in a local file system, the crash mover apps will spool the crashes into HBase as they can. At Mozilla, that's the only role for the FSS in our production system.
There are other uses for FSS. Prior to the adoption of HBase, it served as our primary storage scheme. Since other organizations are interested in using Socorro and their storage needs may not be as extreme as ours, the FSS may be perfectly adequate. In addition, while developing for Socorro, it is useful to have a complete live Socorro installation available. If you've ever tried to install and maintain an HBase installation on a virtual machine on a laptop, you likely would kill for an alternative. For the support of the community of Socorro users as well as our own developers, we're investing in maintaining FSS as a completely functional primary storage mechanism for Socorro. It will only take a switch in a configuration file to chose between either storage system (or future alternate implementations).
Work to rewrite the existing FSS code both begins and is slated for completion in Q1 2013. I have several goals in this rewrite:
For example, the class underlying the saving of a raw crash (called JsonDumpStorage) has a method for saving a raw crash and its associated binary crash dump. Rather just accepting a raw crash and dump, the JsonDumpStorage sets up the directory structure and then returns a tuple of open file handles for the raw crash and dump respectively. It expects the client of the module to do the work of actually writing the file contents and then follow through with closing the open handles.
In my proof of concept implementation of FSS shoe horned into the Crash Storage API, I had to make adapting code to do the work writing the two files under the 'save_raw_crash' and 'save_dump' Crash Storage API. Functionality like this ought to be pushed into the implementation of FSS, minimizing responsibilities of the Crash Storage API code.
There are other uses for FSS. Prior to the adoption of HBase, it served as our primary storage scheme. Since other organizations are interested in using Socorro and their storage needs may not be as extreme as ours, the FSS may be perfectly adequate. In addition, while developing for Socorro, it is useful to have a complete live Socorro installation available. If you've ever tried to install and maintain an HBase installation on a virtual machine on a laptop, you likely would kill for an alternative. For the support of the community of Socorro users as well as our own developers, we're investing in maintaining FSS as a completely functional primary storage mechanism for Socorro. It will only take a switch in a configuration file to chose between either storage system (or future alternate implementations).
Work to rewrite the existing FSS code both begins and is slated for completion in Q1 2013. I have several goals in this rewrite:
- the public API should match the Crash Storage API exactly with no need for adapters
- there should be a two class inheritance hierarchy for the two file system layouts (with vs. without date branch structure)
- the existing PolyCrashStorage and/or Fallback Storage Classes should be subclassed (or used as model) for the case where a need for separate standard and deferred crash storage.
- Implemented in parallel with a full suite of tests
No Need For Adapters
The existing FSS implementation has an API that was fine at the time of its implementation, but is awkward now in view of the more refined Crash Storage API.For example, the class underlying the saving of a raw crash (called JsonDumpStorage) has a method for saving a raw crash and its associated binary crash dump. Rather just accepting a raw crash and dump, the JsonDumpStorage sets up the directory structure and then returns a tuple of open file handles for the raw crash and dump respectively. It expects the client of the module to do the work of actually writing the file contents and then follow through with closing the open handles.
In my proof of concept implementation of FSS shoe horned into the Crash Storage API, I had to make adapting code to do the work writing the two files under the 'save_raw_crash' and 'save_dump' Crash Storage API. Functionality like this ought to be pushed into the implementation of FSS, minimizing responsibilities of the Crash Storage API code.