gcubed.data.database
This module contains the Database
class. It can be rebased to
different years and it provides access to the variables values
for historical years.
Overview
Provides convenience methods for all classes.
All G-Cubed classes inherit from this base class.
Overview
The database class is used directly but it is also subclassed to support specific data usage scenarios.
It encapsulates all of the information about the database of values for all variables across a range of years.
Arguments
sym_data
: The data about the model, created by the SYM processor. This also
provides access to the model configuration.
The data itself, contained in a dataframe with columns indexed by 4 digit (YYYY) year strings and with the rows indexed by variable names.
The (YYYY) format base year for the data. All indexes in the database are based in the specified year. Databases (but not database subclasses) can be rebased to different years.
Overview
Store the data in the database, dropping the columns that are not years.
Arguments
data
: The data to store in the database.
database_variable_names
: The names of the variables in the database. When provided, this is used to do validation of
the database variables against the SYM model variables.
Export the database to a CSV file, making sure that the file extension is '.csv'.
Rebase a database so indices have a new base year. This can be used to convert the database used for calibration to a database with the base year equal to the start year for projections (eg. 2011 to 2018).
Note that this script draws on the approach in the G-Cubed utilities/rebasedata.ox script.
Arguments
new_base_year (int): a YYYY formatted new base year for the database.
Exceptions
Exception is thrown if the database does not contain data for the new base year.
Exception is thrown if the database does not contain data for the year after the new base year if the model has lagged index variables.
Overview
Retrieves data from the database for all of the variables in a specific RHS vector in the model. The data is retrieved for the specified year.
Note that some state variables have their data retrieved for the following year.
Note also that interest rate values can be overridden by the globally defined neutral real interest rate that is set in the model configuration file.
The implementation steps are:
- get the rows for the variables of the given type in varmap.
- get the names of the variables in those rows from varmap.
- use those names to select the data from the calibration year database.
- set the values for those variables in the appropriate places in the vector to that data for that year using the indices specified in the varmap data.
Arguments
vector_name
: The name of the vector to get the values for. This must be a RHS vector listed in
the model's RHS vector names by the SymData class.
year
: The YYYY format year to get data for when populating the RHS vectors.
e.g. 2011 implies linearise model equations around the values
of the model variables in 2011 (or in adjacent years for leads/lags).
use_neutral_real_interest_rate
: True if interest rates are to be overridden with the
model configuration neutral real interest rate and False otherwise.
Returns
A column vector with the requested values for the RHS vector or
None
if the vector has zero length.
Arguments
vector_name
: The three character name of the vector that is to be populated
with data.
year
: the 4 digit integer specifying the year in the database that will be used
to source the data that will be inserted into the named vector.
This method uses the varmap file created by the SYM processor, finding those rows in the varmap that have a value in the var_type column that match the given vector_name, e.g. 'x1r'. The matching rows contain the variable names and their indices within the vector that has been named as an input to the function.
The variable names are used to determine the rows of the database where the data will be sourced.
The year determines the column in the database where the data will be sourced.
Returns
A tuple is returned. That tuple contains a numpy vector of the data that has been extracted from the database and a vector of indices indicating where, in the specified vector, that data should be inserted.
Gets matching data for the given variable prefix for a given vector.
Arguments
variable_prefix
: The prefix for the variable name
vector_name
: the name of the vector to be populated.
### Returns
A tuple containing the indices in the vector to be populated (as a list of integers) and the values to use to do the populating as a numpy column vector.
Gets data for the set of variables with variable names that match the given regular expression.
Arguments
name_regular_expression
: The variable selection criteria. It can be any
regular expression that works with the Python regex
package.
years
: the list of years for which the data is to be retrieved. Note that
this can be a list of integer values or a list of strings.
Returns
A copy of the data for the specified year for all variables with names matching the given regular expression where the names are matched against the row index (labels) in the database.
Gets a copy of the data for the set of variables with variable names that have the given variable name prefix (the part of the name up to but not including the part in brackets).
Arguments
prefix
: The variable name prefixyears
: the list of years for which the data is to be retrieved. Note that this can be a list of integer values or a list of strings. This argument defaults toNone
, in which case, all years of data are returned.
Returns
A copy of the data for the specified years for all variables with names that have the given prefix.
Exceptions
An exception is raised if the prefix is None, is not a string or has a length of zero.
An exception is raised if no variables have the given prefix.
An exception is raised if the years are not valid database years.
An exception is raised if the number of variables in the database does not match the number of variables in the SYM model.
Replace the existing data property with a new dataframe. All of the data is replaced.
This is useful if you need to do projections and then treat those projections as actual data in a subsequent step in your analysis pipeline.
Arguments
new_data
(pd.DataFrame)`: The new dataframe to use.
Used to check if there is a column of data in the database for the specified year.
Arguments:
year
: The 4 digit integer value of the year (YYYY).
Returns
True
if the database has data for the specified year and False
otherwise
This property is used when determining whether the database has been populated with projections, in which case those projections provide values, now stored as data, out to the end year of the projections.
Returns
True
if the database has data for the last projection year and False
otherwise.
Overview
Sets up the scaling factor for variables that have units that end in a gdp suffix but not a usgdp suffix.
These scaling factors can then be used to do variable scaling as we convert between database values and values used to evaluate model equations.
Overview
Retrieves a series of gdp ratio scaling factors for all variables in the database. The values are:
- 1 for variables that do not need to be scaled before use in the model
- the ratio of local nominal GDP to USA nominal GDP (both measured in billions of USD) in the chosen year otherwise.
Arguments
year
: The database year for which the scaling factors are being retrieved.
Returns
The series of values to use for scaling so that variables are a fraction of US GDP rather than local GDP, where that is appropriate. This series is suitable for broadcasting across the database.
Exceptions
An exception is raised if the GDP ratio scaling factors have not been set up in the database.
An exception is raised if the year is not in the database when retrieving the GDP ratio scaling factor.
An exception is raised if the year is not an integer in YYYY format.
Overview
Get the GDP ratio scaling factor for the specified variable in the specified year.
Arguments
variable_name
: The full name of the variable.
year
: The year for which the GDP ratio scaling factor is being retrieved. It can be a string or an integer.
It is converted to a string when used.
Returns
A floating value that is the scaling factor that, when multiplied by the database value, converts it from a percentage of local GDP to a percentage of US GDP.
Overview
Retrieves the value of a variable in the database for a specific year.
Arguments
variable_name
: The full name of the variable.year
: The year for which the value is to be retrieved. It can be a string or an integer. It is converted to a string when used.
Returns
The value of the variable for the specified year if the year is between the first and the last available years in the database, inclusive.
Overview
Set the value of a variable in the database for a specific year.
Arguments
variable_name
: The full name of the variable.year
: The year for which the value is to be set. It can be a string or an integer.value
: The value to set for the variable in the specified year.
Overview
Save the database to a CSV file.
Arguments
filepath
: The path to the file where the database is to be saved.
Exceptions
AssertionError
: If the filename is not an absolute path