gcubed.data.database

This module contains the Database class. It can be rebased to different years and it provides access to the variables values for historical years.

class Database(gcubed.base.Base):

Overview

Provides convenience methods for all classes.

All G-Cubed classes inherit from this base class.

Database(sym_data: gcubed.sym_data.SymData)

Overview

The database class is used directly but it is also subclassed to support specific data usage scenarios.

It encapsulates all of the information about the database of values for all variables across a range of years.

Arguments

sym_data: The data about the model, created by the SYM processor. This also provides access to the model configuration.

The SYM processor output

The model configuration

variables: pandas.core.frame.DataFrame

Metadata about the variables, contained in a dataframe with columns for each type of metadata and with the rows indexed by variable.

The metadata includes the following columns:

  • order - the order of the variable in the database
  • name - the full name of the variable
  • description - the general description of the variable
  • units - the units used in the database
  • region - the region that the variable describes.
data: pandas.core.frame.DataFrame

The data itself, contained in a dataframe with columns indexed by 4 digit (YYYY) year strings and with the rows indexed by variable names.

variables_count: int

The number of variables in the database.

years_count: int

The number of years in the database.

years_column_names: pandas.core.indexes.base.Index

The year column names for the data.

base_year: int

The (YYYY) format base year for the data. All indexes in the database are based in the specified year. Databases (but not database subclasses) can be rebased to different years.

first_available_year: int

The first year of data in the database.

last_available_year: int

The last year of data in the database.

def export_to_csv(self, filename: str):

Export the database to a CSV file, making sure that the file extension is '.csv'.

def rebase(self, new_base_year: int):

Rebase a database so indices have a new base year. This can be used to convert the database used for calibration to a database with the base year equal to the start year for projections (eg. 2011 to 2018).

Note that this script draws on the approach in the G-Cubed utilities/rebasedata.ox script.

Arguments

 new_base_year (int): a YYYY formatted new base year for the database.

Exceptions

Exception is thrown if the database does not contain data for the new base year.

Exception is thrown if the database does not contain data for the year after the new base year if the model has lagged index variables.

def rhs_vector_value( self, vector_name: str, year: int, use_neutral_real_interest_rate=False) -> numpy.ndarray:

Overview

Retrieves data from the database for all of the variables in a specific RHS vector in the model. The data is retrieved for the specified year.

Note that some state variables have their data retrieved for the following year.

Note also that interest rate values can be overridden by the globally defined neutral real interest rate that is set in the model configuration file.

The implementation steps are:

  1. get the rows for the variables of the given type in varmap.
  2. get the names of the variables in those rows from varmap.
  3. use those names to select the data from the calibration year database.
  4. set the values for those variables in the appropriate places in the vector to that data for that year using the indices specified in the varmap data.

Arguments

vector_name: The name of the vector to get the values for. This must be a RHS vector listed in the model's RHS vector names by the SymData class.

year: The YYYY format year to get data for when populating the RHS vectors. e.g. 2011 implies linearise model equations around the values of the model variables in 2011 (or in adjacent years for leads/lags).

use_neutral_real_interest_rate: True if interest rates are to be overridden with the model configuration neutral real interest rate and False otherwise.

Returns

A column vector with the requested values for the RHS vector or None if the vector has zero length.

def get_data_and_varmap_indices(self, vector_name: str, year: int) -> tuple:

Arguments

vector_name: The three character name of the vector that is to be populated with data.

year: the 4 digit integer specifying the year in the database that will be used to source the data that will be inserted into the named vector.

This method uses the varmap file created by the SYM processor, finding those rows in the varmap that have a value in the var_type column that match the given vector_name, e.g. 'x1r'. The matching rows contain the variable names and their indices within the vector that has been named as an input to the function.

The variable names are used to determine the rows of the database where the data will be sourced.

The year determines the column in the database where the data will be sourced.

Returns

A tuple is returned. That tuple contains a numpy vector of the data that has been extracted from the database and a vector of indices indicating where, in the specified vector, that data should be inserted.

def get_data_and_varmap_indices_for_matching_variables(self, variable_prefix: str, vector_name: str, year: int):

Gets matching data for the given variable prefix for a given vector.

Arguments

variable_prefix: The prefix for the variable name

vector_name: the name of the vector to be populated.

### Returns

A tuple containing the indices in the vector to be populated (as a list of integers) and the values to use to do the populating as a numpy column vector.

def get_data( self, name_regular_expression: str, years: list) -> pandas.core.frame.DataFrame:

Gets data for the set of variables with variable names that match the given regular expression.

Arguments

name_regular_expression: The variable selection criteria. It can be any regular expression that works with the Python regex package.

years: the list of years for which the data is to be retrieved. Note that this can be a list of integer values or a list of strings.

Returns

A copy of the data for the specified year for all variables with names matching the given regular expression where the names are matched against the row index (labels) in the database.

def get_non_negative_data( self, variable_name_prefix: str, years: list) -> pandas.core.frame.DataFrame:

Overview

Experimental functionality supporting log models

Gets data for the set of variables with variable names that match the given regular expression.

Arguments

variable_name_prefix: The variable name prefix.

years: the list of years for which the data is to be retrieved. Note that this can be a list of integer values or a list of strings.

Returns

If the prefix matches one or more variables in the database, then a copy of the data for the specified year for all matching variables is returned.

If the prefix fails to match any variables in the database, then the prefix is adjusted to start with ln followed by the supplied prefix. This constitutes an attempt to find a the natural log of the specified variable. If this succeeds, then the returned data is calculated by taking the exponential value of the matching log variable and then multiplying the result by YRATR for the corresponding regions.

Exceptions

If the prefix and the ln + prefix both fail to match any variables in the database, then an exception is raised.

def update_data(self, new_data: pandas.core.frame.DataFrame):

Replace the existing data property with a new dataframe. All of the data is replaced.

This is useful if you need to do projections and then treat those projections as actual data in a subsequent step in your analysis pipeline.

Arguments

new_data (pd.DataFrame)`: The new dataframe to use.

def has_data(self, year: int) -> bool:

Used to check if there is a column of data in the database for the specified year.

Arguments:

year: The 4 digit integer value of the year (YYYY).

Returns

True if the database has data for the specified year and False otherwise

has_data_for_all_projection_years: bool

This property is used when determining whether the database has been populated with projections, in which case those projections provide values, now stored as data, out to the end year of the projections.

Returns

True if the database has data for the last projection year and False otherwise.

def set_up_yratr(self):

Overview

Sets up the YRATR scaling factor for variables that are not logged and that have units that end in a gdp suffix (except for those with units of usgdp).

These values can then be used to do variable scaling as we convert between database values and values used in the model equations.

def yratr_scaling_factor(self, year: int) -> pandas.core.frame.DataFrame:

Overview

Retrieves a column vector of YRATR values for all variables in the database. The values are 1 for variables that do not need to be scaled by YRATR for use in the model and are equal to the YRATR values in the chosen year for the region associated with the variable, otherwise.

Arguments

year: The year for which the yratr scaling factors for all variables are being retrieved.

Return the dataframe with a single column, for the base year of the database that contains the YRATR values for all variables that need to be scaled by YRATR when converting between database values and the values that are used in the model.

def yratr_scaling_factor_for_variable( self, variable_name: str, year: <built-in function any>) -> pandas.core.frame.DataFrame:

Overview

Get the YRATR scaling factor for the specified variable in the specified year.

Arguments

variable_name: The full name of the variable.

year: The year for which the YRATR scaling factor is being retrieved. It can be a string or an integer. It is converted to a string when used.

Returns

A floating value that is the scaling factor that, when multiplied by the database value, converts it from a percentage of local GDP to a percentage of US GDP.

def value(self, variable_name: str, year: <built-in function any>) -> float:

Overview

Retrieves the value of a variable in the database for a specific year.

Arguments

  • variable_name: The full name of the variable.

  • year: The year for which the value is to be retrieved. It can be a string or an integer. It is converted to a string when used.

Returns

The value of the variable for the specified year if the year is between the first and the last available years in the database, inclusive.

def set_value( self, variable_name: str, year: <built-in function any>, value: float):

Overview

Set the value of a variable in the database for a specific year.

Arguments

  • variable_name: The full name of the variable.

  • year: The year for which the value is to be set. It can be a string or an integer.

  • value: The value to set for the variable in the specified year.

def save(self, filename: str):

Overview

Save the database to a CSV file.

Arguments

  • filename: The filename to save the database to

Exceptions

  • AssertionError: If the filename is not an absolute path