gcubed

The gcubed package provides a full implementation of the G-Cubed model.

An introduction to the G-Cubed Python implementation is available online.

Support for G-Cubed is available from McKibbin Software Group.

©1993-2023 McKibbin Software Group Pty Ltd. All rights reserved.

CompressedDF = typing.Tuple[str, bytes]
StateValue = typing.Any | typing.Tuple[str, bytes]
StateDict = typing.Dict[str, typing.Any | typing.Tuple[str, bytes]]
DF_COMPRESS_THRESHOLD_BYTES: int = 100000
NP_COMPRESS_THRESHOLD_BYTES: int = 100000
PARQUET_COMPRESSION: str = 'zstd'
DIRECT_NDARRAY_COMPRESSION_ENV_VAR: str = 'GCUBED_SERIALISATION_COMPRESS_DIRECT_NDARRAYS'
MATRIX_DICTIONARY_PACKING_ENV_VAR: str = 'GCUBED_SERIALISATION_PACK_MATRIX_DICTIONARIES'
def serialisation_direct_ndarray_compression_enabled() -> bool:

Return whether direct ndarray attributes should be compressed individually.

The default is disabled because model artifacts are normally written with an outer joblib zlib compression layer. Compressing large direct numeric arrays here first makes those bytes pass through two compression operations. On the 6N/196 benchmark, leaving direct arrays native reduced baseline save time from about 23.8s to 18.2s with effectively unchanged artifact size.

def serialisation_matrix_dictionary_packing_enabled() -> bool:

Return whether matrix dictionaries should be stored as one compressed payload.

def matrix_dictionary_to_npz_compressed_bytes( matrix_dictionary: dict[tuple[str, str], numpy.ndarray | None]) -> bytes | None:

Pack a matrix dictionary into one compressed NPZ payload.

def matrix_dictionary_from_npz_compressed_bytes(data: bytes) -> dict[tuple[str, str], numpy.ndarray | None]:

Unpack a matrix dictionary stored by matrix_dictionary_to_npz_compressed_bytes.

def configure_logging(folder: pathlib.Path):

Overview

Convenience method for setting up the logging system.

Run this at the start of an experiment to set up a standard logging system.

Arguments

  • folder: The folder where logs will be saved.
def custom_exception_handler(exc_type, exc_value, exc_traceback):

If the exception is not KeyboardInterrupt, handle it.

def install_project_warning_handler(project_root: pathlib.Path, logger: logging.Logger | None = None) -> None:

Install a warning handler that:

  • Logs full warning messages with package-aware source info.
  • Highlights the user code location inside the given project_root.
  • Adds diagnostics for common NumPy warnings.

Args: project_root (Path): The root directory of your project. logger (Optional[logging.Logger]): Optional custom logger.

def now() -> str:

The current time and date as a string with a format that is %Y-%m-%d_%H-%M-%S

This is useful for result reporting and debugging purposes when naming output files.

def log(x: float) -> float:

The natural logarithm of x.

This is a convenience function that is provided for consistency with the G-Cubed model.

Parameters

x : float The value for which the natural logarithm is required.

Returns

float The natural logarithm of x.

def file_summary(file_path: pathlib.Path, ancestors=3) -> str:

Overview

Get a summary of a file path

Arguments

file_path : The file path to summarise.

ancestors : The number of ancestor folders to include in the summary, 3 by default.

Returns

A string summarising the file path.

Exceptions

If the file path is not a Path object, an assertion error is raised.

def df_nbytes(df: pandas.DataFrame) -> int:
def df_to_parquet_bytes(df: pandas.DataFrame) -> bytes | None:

Try to serialise a DataFrame to Parquet bytes with Zstd compression. Returns None if pyarrow/fastparquet is unavailable.

def df_from_parquet_bytes(data: bytes) -> pandas.DataFrame:
def df_to_pickle_zlib_bytes(df: pandas.DataFrame) -> bytes:
def df_from_pickle_zlib_bytes(data: bytes) -> pandas.DataFrame: