gcubed.data.database
This module contains the Database
class. It can be rebased to
different years and it provides access to the variables values
for historical years.
Overview
Provides convenience methods for all classes.
All G-Cubed classes inherit from this base class.
Overview
The database class is used directly but it is also subclassed to support specific data usage scenarios.
It encapsulates all of the information about the database of values for all variables across a range of years.
Arguments
sym_data
: The data about the model, created by the SYM processor. This also
provides access to the model configuration.
Metadata about the variables, contained in a dataframe with columns for each type of metadata and with the rows indexed by variable.
The metadata includes the following columns:
order
- the order of the variable in the databasename
- the full name of the variabledescription
- the general description of the variableunits
- the units used in the databaseregion
- the region that the variable describes.
The data itself, contained in a dataframe with columns indexed by 4 digit (YYYY) year strings and with the rows indexed by variable names.
The (YYYY) format base year for the data. All indexes in the database are based in the specified year. Databases (but not database subclasses) can be rebased to different years.
Export the database to a CSV file, making sure that the file extension is '.csv'.
Rebase a database so indices have a new base year. This can be used to convert the database used for calibration to a database with the base year equal to the start year for projections (eg. 2011 to 2018).
Note that this script draws on the approach in the G-Cubed utilities/rebasedata.ox script.
Arguments
new_base_year (int): a YYYY formatted new base year for the database.
Exceptions
Exception is thrown if the database does not contain data for the new base year.
Exception is thrown if the database does not contain data for the year after the new base year if the model has lagged index variables.
Overview
Retrieves data from the database for all of the variables in a specific RHS vector in the model. The data is retrieved for the specified year.
Note that some state variables have their data retrieved for the following year.
Note also that interest rate values can be overridden by the globally defined neutral real interest rate that is set in the model configuration file.
The implementation steps are:
- get the rows for the variables of the given type in varmap.
- get the names of the variables in those rows from varmap.
- use those names to select the data from the calibration year database.
- set the values for those variables in the appropriate places in the vector to that data for that year using the indices specified in the varmap data.
Arguments
vector_name
: The name of the vector to get the values for. This must be a RHS vector listed in
the model's RHS vector names by the SymData class.
year
: The YYYY format year to get data for when populating the RHS vectors.
e.g. 2011 implies linearise model equations around the values
of the model variables in 2011 (or in adjacent years for leads/lags).
use_neutral_real_interest_rate
: True if interest rates are to be overridden with the
model configuration neutral real interest rate and False otherwise.
Returns
A column vector with the requested values for the RHS vector or
None
if the vector has zero length.
Arguments
vector_name
: The three character name of the vector that is to be populated
with data.
year
: the 4 digit integer specifying the year in the database that will be used
to source the data that will be inserted into the named vector.
This method uses the varmap file created by the SYM processor, finding those rows in the varmap that have a value in the var_type column that match the given vector_name, e.g. 'x1r'. The matching rows contain the variable names and their indices within the vector that has been named as an input to the function.
The variable names are used to determine the rows of the database where the data will be sourced.
The year determines the column in the database where the data will be sourced.
Returns
A tuple is returned. That tuple contains a numpy vector of the data that has been extracted from the database and a vector of indices indicating where, in the specified vector, that data should be inserted.
Gets matching data for the given variable prefix for a given vector.
Arguments
variable_prefix
: The prefix for the variable name
vector_name
: the name of the vector to be populated.
### Returns
A tuple containing the indices in the vector to be populated (as a list of integers) and the values to use to do the populating as a numpy column vector.
Gets data for the set of variables with variable names that match the given regular expression.
Arguments
name_regular_expression
: The variable selection criteria. It can be any
regular expression that works with the Python regex
package.
years
: the list of years for which the data is to be retrieved. Note that
this can be a list of integer values or a list of strings.
Returns
A copy of the data for the specified year for all variables with names matching the given regular expression where the names are matched against the row index (labels) in the database.
Overview
Experimental functionality supporting log models
Gets data for the set of variables with variable names that match the given regular expression.
Arguments
variable_name_prefix
: The variable name prefix.
years
: the list of years for which the data is to be retrieved. Note that
this can be a list of integer values or a list of strings.
Returns
If the prefix matches one or more variables in the database, then a copy of the data for the specified year for all matching variables is returned.
If the prefix fails to match any variables in the database, then
the prefix is adjusted to start with ln
followed by the supplied prefix.
This constitutes an attempt to find a the natural log of the specified variable.
If this succeeds, then the returned data is calculated by taking the exponential
value of the matching log variable and then multiplying the result by YRATR for
the corresponding regions.
Exceptions
If the prefix and the ln + prefix both fail to match any variables in the database, then an exception is raised.
Replace the existing data property with a new dataframe. All of the data is replaced.
This is useful if you need to do projections and then treat those projections as actual data in a subsequent step in your analysis pipeline.
Arguments
new_data
(pd.DataFrame)`: The new dataframe to use.
Used to check if there is a column of data in the database for the specified year.
Arguments:
year
: The 4 digit integer value of the year (YYYY).
Returns
True
if the database has data for the specified year and False
otherwise
This property is used when determining whether the database has been populated with projections, in which case those projections provide values, now stored as data, out to the end year of the projections.
Returns
True
if the database has data for the last projection year and False
otherwise.
Overview
Sets up the YRATR scaling factor for variables that are not logged and that have units that end in a gdp suffix (except for those with units of usgdp).
These values can then be used to do variable scaling as we convert between database values and values used in the model equations.
Overview
Retrieves a column vector of YRATR values for all variables in the database. The values are 1 for variables that do not need to be scaled by YRATR for use in the model and are equal to the YRATR values in the chosen year for the region associated with the variable, otherwise.
Arguments
year
: The year for which the yratr scaling factors for all variables are being retrieved.
Return the dataframe with a single column, for the base year of the database that contains the YRATR values for all variables that need to be scaled by YRATR when converting between database values and the values that are used in the model.
Overview
Get the YRATR scaling factor for the specified variable in the specified year.
Arguments
variable_name
: The full name of the variable.
year
: The year for which the YRATR scaling factor is being retrieved. It can be a string or an integer.
It is converted to a string when used.
Returns
A floating value that is the scaling factor that, when multiplied by the database value, converts it from a percentage of local GDP to a percentage of US GDP.
Overview
Retrieves the value of a variable in the database for a specific year.
Arguments
variable_name
: The full name of the variable.year
: The year for which the value is to be retrieved. It can be a string or an integer. It is converted to a string when used.
Returns
The value of the variable for the specified year if the year is between the first and the last available years in the database, inclusive.
Overview
Set the value of a variable in the database for a specific year.
Arguments
variable_name
: The full name of the variable.year
: The year for which the value is to be set. It can be a string or an integer.value
: The value to set for the variable in the specified year.
Overview
Save the database to a CSV file.
Arguments
filename
: The filename to save the database to
Exceptions
AssertionError
: If the filename is not an absolute path