firefly/ntbks/reader_tutorial.ipynb
[1]:
%load_ext autoreload
%autoreload 2
from IPython.display import YouTubeVideo
A recording of this jupyter notebook in action is available at:
[2]:
YouTubeVideo("lYPGa6DibOk")
[2]:
[3]:
import sys
import os
import numpy as np
## ignore this line, you do not need to add this if Firefly is pip installed into your PYTHONPATH
sys.path.insert(0,"/Users/agurvich/research/repos/firefly/src/")
from firefly.data_reader import Reader,ArrayReader,ParticleGroup
don't have phil's colormaps
Tutorial notebook: Using the Reader
class
One of the main purposes of Firefly is to enable users to interactively explore their own data (or else interactively explore someone else’s respective own data). While it is possible to format one’s data manually using a text editor we have provided a python API for producing the .json
files that are necessary to run an iteration of firefly.
The central piece of that API is the firefly.data_reader.Reader
class, which acts to collect the different parts of the API together to produce consistently formatted and linked .json
files that the Firefly webapp can interpret. The Reader
class is documented here but here we only provide a brief example of how to use the API.
Creating a Reader instance
To begin, we’ll start by initializing a Reader
object. Users are encouraged to familiarize themselves with the different keyword arguments through the documentation linked above.
Perhaps the most important keyword argument is the JSONdir
, which tells your Reader
object where it should collect the different .json
files it will produce. The .json
files have to be readable from the firefly/static/data
directory of the iteration of Firefly that’s trying to open the data. The Reader
class will automatically create a shortcut to the directory if you don’t choose a path that lives in firefly/static/data
. If you enter a relative path it will assume
you mean relative to your ${HOME}
directory. If no JSONdir
is provided then it will default to ${HOME}/<JSONprefix>
(which itself defaults to Data
if nothing is passed).
[4]:
## initialize a Reader object, cwd will be firefly/ntbks
JSONdir = os.path.abspath(os.path.join(os.getcwd(),'..','static','data','tutorial'))
my_reader = Reader(JSONdir=JSONdir)
[5]:
## let's create some sample data, a grid of points in a 3d cube
my_coords = np.linspace(-10,10,20)
xs,ys,zs = np.meshgrid(my_coords,my_coords,my_coords)
xs,ys,zs = xs.flatten(),ys.flatten(),zs.flatten()
coords = np.array([xs,ys,zs]).T
## we'll pick some random field values to demonstrate filtering/colormapping
fields = np.random.random(size=xs.size)
Store the coordinates in a ParticleGroup
Particle data is validated and organized in firefly.data_reader.ParticleGroup
objects. In general users should not sub-class the ParticleGroup
class but if you’re an enterprising user with a specific use case I’m a tutorial not a cop! For details about how the ParticleGroup
class works, check the particle group documentation.
For our purposes, we’ll take advantage of the fact that any keyword arguments passed here go directly to the particleGroup.settings_default
dictionary which controls which elements appear in the particle panes in the UI, see the settings documentation or see settings_tutorial.ipynb
for an example.
Note: Sometimes data is too large to load directly into Firefly, we encourage users who are trying a new dataset for the first time to use the decimation_factor
keyword argument to reduce the dataset size by the factor specified (the implementation is just a shuffle(coords)[::decimation_factor]
).
[6]:
## create a particle group that contains our test coordinates
my_particleGroup = ParticleGroup(
'partname',
coords,
sizeMult=5, ## increase the size of the particles to make the particles a bit easier to see since there's so few of them
color = [0,0,1,1], ## make them blue, colors should be RGBA list,
field_arrays=[fields], ## track the dummy field to demonstrate how to pass field data
field_names=['testfield']) ## name the dummy field
## sometimes data is too large to load directly into Firefly
my_decimated_particleGroup = ParticleGroup(
'decimated',
coords,
sizeMult=5, ## increase the size of the particles to make the particles a bit easier to see since there's so few of them
color = [0,0,1,1], ## make them blue, colors should be RGBA list,
field_arrays=[fields], ## track the dummy field to demonstrate how to pass field data
field_names=['testfield'], ## name the dummy field
decimation_factor=10)
Make sure each field_array (1) has a field_filter_flag (0), assuming True.
Make sure each field_array (1) has a field_colormap_flag (0), assuming True.
Make sure each field_array (1) has a field_radius_flag (0), assuming False.
Make sure each field_array (1) has a field_filter_flag (0), assuming True.
Make sure each field_array (1) has a field_colormap_flag (0), assuming True.
Make sure each field_array (1) has a field_radius_flag (0), assuming False.
All that’s left is to connect the ParticleGroup
object to the Reader
object using the .addParticleGroup
method.
[7]:
## instructs my_reader to keep track of my_particleGroup
my_reader.addParticleGroup(my_particleGroup)
my_reader.addParticleGroup(my_decimated_particleGroup)
print(my_reader)
print(my_reader.particleGroups)
Reader with 2 particle groups
[partname - 8000/8000 particles - 1 tracked fields
decimated - 800/8000 particles - 1 tracked fields]
Notice that the decimation factor is represented by the fraction 800/8000 in the second particle group “decimated”.
Outputting to .ffly
At this point we’re ready to output our data to .ffly
format in order to load in with Firefly. The Reader
object will automatically dump all of the necessary files associated with each of the ParticleGroup
objects and Settings
objects we’ve attached to it as described in the reader documentation.
[8]:
## have the reader dump all its data to the different JSON files
my_reader.writeToDisk(loud=True)
Outputting: partname - 8000/8000 particles - 1 tracked fields
Outputting: decimated - 800/8000 particles - 1 tracked fields
[8]:
''
Notice that .writeToDisk
returned an empty string, this is because the .ffly
and configuration .json
files were written to disk. Another option is instead to produce a single .json
formatted string with all the data that would’ve been written to disk. This is useful for transmitting data through Flask, which is the subject of another tutorial.
[9]:
## have the reader dump all its data to a single big string
big_JSON = my_reader.writeToDisk(loud=True,write_to_disk=False,extension='.json')
print("big_JSON has %d characters"%len(big_JSON))
Outputting: partname - 8000/8000 particles - 1 tracked fields
Outputting: decimated - 800/8000 particles - 1 tracked fields
big_JSON has 451720 characters
Using an ArrayReader
sub-class
The procedure outlined above is a common use case, and so we’ve provided a sub-class to firefly.data_reader.Reader
, firefly.data_reader.ArrayReader
which wraps the ParticleGroup
and .addParticleGroup
so the user can get a Reader
containing their data with a single initialization. It will automatically name particle groups and fields unless they are specified directly (see reader documentation).
[10]:
my_arrayReader = ArrayReader(
coords,
fields=fields,
JSONdir=JSONdir,
write_to_disk=True)
Make sure each field_array (1) has a field_filter_flag (0), assuming True.
Make sure each field_array (1) has a field_colormap_flag (0), assuming True.
Make sure each field_array (1) has a field_radius_flag (0), assuming False.
Outputting: PGroup_0 - 8000/8000 particles - 1 tracked fields