Modules

This section includes modules that are used in this project.

database.py

The core module for the Database class.

class nsaphx.database.Database(db_path)[source]

Database class

Db_path:

str Path to the database file.

The Database class takes care of the access to the database. It is a wrapper class for the sqlitedict package. The Database class has a in-memory cache to speed up the access to the recentely used values. User can update the cache size.

>>> from nsaphx.database import Database
>>> db = Database("test.db")
>>> db.set_value("key1", "value1")
>>> db.get_value("key1")
'value1'
>>> db.delete_value("key1")
>>> db.get_value("key1")
>>> db.update_cache_size(100)
>>> db.close_db()
_init_reserved_keys()[source]

Initializes the reserved keys in the database. There are two reserved keys: RESERVED_KEYS and PROJECTS_LIST.

close_db()[source]

Commits changes to the database, closes the database, clears the cache.

delete_value(key)[source]

Deletes the key, and its value from both in-memory dictionary and on-disk database. If the key is not found, simply ignores it.

Inputs:

key: str A hash value (generated by the package)
get_value(key)[source]

Returns the value in the following order:

1) It will look for the value in the cache and return it, if not found
2) will look for the value in the disk and return it, if not found
3) will return None.
Parameters:

key (str) – hash value (generated by the package)

Returns:

value – If found, value, else returns None.

Return type:

Any | None

set_value(key, value)[source]

Sets the key and given value in the database. If the key exists, it will override the value. In that case, it will remove the key from the in-memory dictionary. It will be loaded again with the get_value command if needed.

Inputs:

key: str hash value (generated by the package)
value: Any Any python object
summary()[source]

Returns a summary of the cache. It includes the length, limit and human readible cache size.

update_cache_size(new_size)[source]

Update the cache size. If the new size is smaller than the current size, it will remove the oldest items from the cache.

Parameters:

new_size (int) – A new cache size (this is the number of items, not the size on the disk)

project_controller.py

The core module for the ProjectController class.

class nsaphx.project_controller.ProjectController(db_path)[source]

ProjectController class

The ProjectController class manages the projects. It provides suite of methods to add, remove, and connect to projects. It also provides a summary method to print the list of projects. Each project is defined by a folder with a project.yaml file.

Parameters:

db_path (str) – Path to the database file.

_update_project_list()[source]

The PROJECTS_LIST is a list of available projects’ hash values.

connect_to_project(folder_path=None)[source]

Connect to an existing project defined by a folder with a project.yaml file inside.

Parameters:

folder_path (str) – A path to the project folder containing the project.yaml file.

create_project(folder_path=None)[source]

Create a new project and add it to the database. The project is defined by a folder with a project.yaml file inside.

Parameters:

folder_path (str) – A path to the project folder containing the project.yaml file.

get_project(pr_name)[source]

Get a project object from the database.

Parameters:

pr_name (str) – Name of the project to be retrieved.

Returns:

project – The project object.

Return type:

Any

remove_project(project_name)[source]

Remove a project from the database, the list of projects, and the in-memory cache. Run pc.summary() to see the list of projects.

Parameters:

project_name (str) – Name of the project to be removed.

summary()[source]

Print the number of available projects with project names.

project.py

The core module for the Project class.

class nsaphx.project.Project(project_params, db_path)[source]

Project Class The Project class generates a project object with collecting the project’s details.

Parameters:
  • project_params (dict) –

  • mandotary (The parameters of the project. It should contain the following) –

  • keys

  • name (|) –

  • project_id (|) –

  • data.outcome_path (|) –

  • data.exposure_path (|) –

  • data.covariate_path (|) –

Notes

The project object does not load the data. It only stores the paths to the data. Other than mandatory keys, the project_params can contain other keys.

Examples

>>> from nsaphx.project import Project
>>> project_params = {"name": "test_project", "project_id": 1,
                      "data": {"outcome_path": "data/outcome.csv",
                               "exposure_path": "data/exposure.csv",
                               "covariate_path": "data/covariate.csv"}}
>>> project = Project(project_params = project_params, db_path = "test.db")
_add_main_data_node()[source]

Add the main data node to the database.

data_node.py

The core module for the MainDataNode and DataNode classes.

class nsaphx.data_node.DataClass[source]

The DataClass is an abstract class for the MainDataNode and DataNode classes. It contains the common attributes and methods for the two classes.

class nsaphx.data_node.DataNode(parent_node_hash, instruction, db_path)[source]

The DataNode is the basic unit in the data pipeline. User can apply instructions to the DataNode to generate a new DataNode. Each DataNode has a parent DataNode and can have one or more decendant DataNodes.

Parameters:
  • parent_node_hash (str) – The hash value of the parent DataNode.

  • instruction (dict) – The instruction to be applied to the node.

  • db_path (str) – The path to the database file.

input_data

The input data of the node.

Type:

dict

output_data

The output data of the node.

Type:

dict

computed

Whether the node has been computed.

Type:

bool

hash_value

The hash value of the node.

Type:

str

node_id

The node id of the node (This is a shortened hash value.)

Type:

str

parent_node_hash

The hash value of the parent DataNode (or MainDataNode).

Type:

str

instruction

The instruction to be applied to the node.

Type:

dict

db_path

The path to the database file.

Type:

str

descendant_hash

A list of hash values of the decendant DataNodes.

Type:

list

hash_by_type

A dictionary containing the hash values of the decendant DataNodes that is grouped by the type of the DataNode.

Type:

dict

db

The database object.

Type:

Database

_add_hash()[source]

Add the hash value and node_id to the node.

_connect_to_database()[source]

Connect to the database.

_update_input_data()[source]

Update the input data of the node. This function will get the output data of the parent node and set it as the input data of the node. The current node will be updated on the database.

access_input_data(data_name=None)[source]

Access the input data of the node. This function will load the data from the data files and return a dictionary containing the data.

Parameters:

data_name (str, optional) – The name of the data to be accessed. If None, all data will be accessed. (default: None)

access_output_data(data_name=None)[source]

Access the output data of the node. This function will load the data from the data files and return a dictionary containing the data.

Parameters:

data_name (str, optional) – The name of the data to be accessed. If None, all data will be accessed. (default: None)

check_data()[source]

Check the data of the node. This function will check if the data files are accessible and print out the file size.

compute()[source]

Compute the node. This function will call the plugin function to compute the node and update the node on the database.

reset()[source]

Reset the node. This function will reset the node to the state before it is computed.

update_node_on_db()[source]

Update the node on the database.

class nsaphx.data_node.MainDataNode(project_params, db_path)[source]

The MainDataNode is the first node in the data pipeline. User can apply instructions to the MainDataNode to generate a new DataNode. Each project has one MainDataNode object.

Parameters:
  • project_params (dict) – A dictionary containing the project parameters.

  • db_path (str) – The path to the database file.

_connect_to_database()[source]

Connect to the database.

access_data(data_name=None)[source]

Access the data in the data node. If data_name is None, return a dictionary containing all the data. Otherwise, return the data specified by data_name.

Parameters:

data_name (str, optional) – The name of the data to be returned, by default None

Returns:

data – A dictionary containing the requested data in the data node.

Return type:

dict

apply_instruction_chain(instruction_list)[source]

Apply a list of instructions to the MainDataNode to generate a sequence of new DataNodes.

check_data()[source]

Check if the data files are accessible and print out the file size.

plugin_registery.py

The core module for registering plugins.