Modules
This section includes modules that are used in this project.
database.py
The core module for the Database class.
- class nsaphx.database.Database(db_path)[source]
Database class
- Db_path:
str Path to the database file.
The Database class takes care of the access to the database. It is a wrapper class for the sqlitedict package. The Database class has a in-memory cache to speed up the access to the recentely used values. User can update the cache size.
>>> from nsaphx.database import Database >>> db = Database("test.db") >>> db.set_value("key1", "value1") >>> db.get_value("key1") 'value1'
>>> db.delete_value("key1") >>> db.get_value("key1") >>> db.update_cache_size(100) >>> db.close_db()
- _init_reserved_keys()[source]
Initializes the reserved keys in the database. There are two reserved keys: RESERVED_KEYS and PROJECTS_LIST.
- delete_value(key)[source]
Deletes the key, and its value from both in-memory dictionary and on-disk database. If the key is not found, simply ignores it.
Inputs:
key: str A hash value (generated by the package)
- get_value(key)[source]
Returns the value in the following order:
1) It will look for the value in the cache and return it, if not found2) will look for the value in the disk and return it, if not found3) will return None.- Parameters:
key (str) – hash value (generated by the package)
- Returns:
value – If found, value, else returns None.
- Return type:
Any | None
- set_value(key, value)[source]
Sets the key and given value in the database. If the key exists, it will override the value. In that case, it will remove the key from the in-memory dictionary. It will be loaded again with the get_value command if needed.
Inputs:
key: str hash value (generated by the package)value: Any Any python object
project_controller.py
The core module for the ProjectController class.
- class nsaphx.project_controller.ProjectController(db_path)[source]
ProjectController class
The ProjectController class manages the projects. It provides suite of methods to add, remove, and connect to projects. It also provides a summary method to print the list of projects. Each project is defined by a folder with a project.yaml file.
- Parameters:
db_path (str) – Path to the database file.
- connect_to_project(folder_path=None)[source]
Connect to an existing project defined by a folder with a project.yaml file inside.
- Parameters:
folder_path (str) – A path to the project folder containing the project.yaml file.
- create_project(folder_path=None)[source]
Create a new project and add it to the database. The project is defined by a folder with a project.yaml file inside.
- Parameters:
folder_path (str) – A path to the project folder containing the project.yaml file.
- get_project(pr_name)[source]
Get a project object from the database.
- Parameters:
pr_name (str) – Name of the project to be retrieved.
- Returns:
project – The project object.
- Return type:
Any
project.py
The core module for the Project class.
- class nsaphx.project.Project(project_params, db_path)[source]
Project Class The Project class generates a project object with collecting the project’s details.
- Parameters:
project_params (dict) –
mandotary (The parameters of the project. It should contain the following) –
keys –
name (|) –
project_id (|) –
data.outcome_path (|) –
data.exposure_path (|) –
data.covariate_path (|) –
Notes
The project object does not load the data. It only stores the paths to the data. Other than mandatory keys, the project_params can contain other keys.
Examples
>>> from nsaphx.project import Project >>> project_params = {"name": "test_project", "project_id": 1, "data": {"outcome_path": "data/outcome.csv", "exposure_path": "data/exposure.csv", "covariate_path": "data/covariate.csv"}} >>> project = Project(project_params = project_params, db_path = "test.db")
data_node.py
The core module for the MainDataNode and DataNode classes.
- class nsaphx.data_node.DataClass[source]
The DataClass is an abstract class for the MainDataNode and DataNode classes. It contains the common attributes and methods for the two classes.
- class nsaphx.data_node.DataNode(parent_node_hash, instruction, db_path)[source]
The DataNode is the basic unit in the data pipeline. User can apply instructions to the DataNode to generate a new DataNode. Each DataNode has a parent DataNode and can have one or more decendant DataNodes.
- Parameters:
parent_node_hash (str) – The hash value of the parent DataNode.
instruction (dict) – The instruction to be applied to the node.
db_path (str) – The path to the database file.
- input_data
The input data of the node.
- Type:
dict
- output_data
The output data of the node.
- Type:
dict
- computed
Whether the node has been computed.
- Type:
bool
- hash_value
The hash value of the node.
- Type:
str
- node_id
The node id of the node (This is a shortened hash value.)
- Type:
str
- parent_node_hash
The hash value of the parent DataNode (or MainDataNode).
- Type:
str
- instruction
The instruction to be applied to the node.
- Type:
dict
- db_path
The path to the database file.
- Type:
str
- descendant_hash
A list of hash values of the decendant DataNodes.
- Type:
list
- hash_by_type
A dictionary containing the hash values of the decendant DataNodes that is grouped by the type of the DataNode.
- Type:
dict
- _update_input_data()[source]
Update the input data of the node. This function will get the output data of the parent node and set it as the input data of the node. The current node will be updated on the database.
- access_input_data(data_name=None)[source]
Access the input data of the node. This function will load the data from the data files and return a dictionary containing the data.
- Parameters:
data_name (str, optional) – The name of the data to be accessed. If None, all data will be accessed. (default: None)
- access_output_data(data_name=None)[source]
Access the output data of the node. This function will load the data from the data files and return a dictionary containing the data.
- Parameters:
data_name (str, optional) – The name of the data to be accessed. If None, all data will be accessed. (default: None)
- check_data()[source]
Check the data of the node. This function will check if the data files are accessible and print out the file size.
- compute()[source]
Compute the node. This function will call the plugin function to compute the node and update the node on the database.
- class nsaphx.data_node.MainDataNode(project_params, db_path)[source]
The MainDataNode is the first node in the data pipeline. User can apply instructions to the MainDataNode to generate a new DataNode. Each project has one MainDataNode object.
- Parameters:
project_params (dict) – A dictionary containing the project parameters.
db_path (str) – The path to the database file.
- access_data(data_name=None)[source]
Access the data in the data node. If data_name is None, return a dictionary containing all the data. Otherwise, return the data specified by data_name.
- Parameters:
data_name (str, optional) – The name of the data to be returned, by default None
- Returns:
data – A dictionary containing the requested data in the data node.
- Return type:
dict
plugin_registery.py
The core module for registering plugins.