Managing multiple datasets

Tutorial notebook: Managing multiple datasets with Firefly
- Editing the entries of startup.json
- Creating a standalone iteration of Firefly

With `startup.json`

When Firefly first opens it searches for the startup file, startup.json, in firefly/static/data. If startup.json does not exist, Firefly will display a button that will allow you to select the directory containing the files you want to load (the directory must contain a manifest file, filenames.json).

This interface can also be accessed if you already have a dataset loaded and displayed in Firefly from within the UI by clicking on the Load New Data button.

Note

Some browsers may show a default warning message that you are about to upload many files to the site and to only do so if you trust the site. Please allow Firefly to upload these files– you are not uploading them to the internet, only to your browser.

Warning

For most browsers, you will only be able to select a directory that is a sub-directory of firefly/static/data. You must keep your .json files there or could use symbolic links within the data directory pointing to elsewhere on your local disk, e.g.,

ln -s /home/mydirectory/snapdir_XXX

firefly.data_reader.Reader.writeToDisk() will automatically create a symbolic link if it detects that the JSONdir you specified is not a sub-directory of firefly/static/data.

If you have multiple data sets available on your computer and prefer to have a menu of these data files to choose from at the start of Firefly, you can append entries to the startup.json file to create a list of directories . For instance, a Reader may create a startup.json file that contains the following:

{"0":"data\/snapdir_001"}

You could manually append this to contain the following:

{"0":"data\/snapdir_001",
"1":"data\/snapdir_002",
"2":"data\/snapdir_003",
"3":"data\/snapdir_004"}

Or use the write_startup=append keyword argument of __init__() when initializing your second dataset.

With this startup.json, you would see a button when Firefly loads that, when clicked, will allow you to choose which data set to display. In general, this method may be useful if the Firefly webserver you are accessing is not hosted locally and is instead being port forwarded to your local browser (which can only see your local file system).

With separate Firefly source directories

Alternatively, one could make many copies of the Firefly source directory, each with their own startup.json.

To facilitate this, we provide the firefly.data_reader.Reader.copyFireflySourceToTarget() method which will create a new directory and copy the necessary source files to run Firefly within it (without the Python frontend API).

You can optionally specify to also copy the necessary Flask files to run a flask local server by keyword argument but this is disabled by default. Instead, this feature is envisioned to enable users to quickly create instances of Firefly that they can host on the internet.

To streamline this process even further, we provide an optional init_gh_pages keyword argument that will even attempt to make a new GitHub repository with GitHub pages, a free webhosting service offered by GitHub, enabled.

Note

To use the init_gh_pages keyword argument you must have created a GitHub OAUTH token somewhere on your system and passed it to copyFireflySourceToTarget() using the GHOAUTHTOKENPATH keyword argument (which defaults to $HOME/.github.token. Attempting to use the init_gh_pages flag without doing so will raise an error message with instructions for how to generate a GitHub OAUTH token.

Managing multiple datasets

With startup.json

With separate Firefly source directories

With `startup.json`