Model data files

Table of contents

This documentation pertains to model builds up to build 178. It describes the data files, all of which are stored in the model’s data folder. All of the data files are in the CSV format.

The mains files are:

  • database.csv - the database file
  • iotables.csv - the input/output tables for all regions
  • setparameters.csv - user-defined parameters
  • modpop.csv - population growth rates
  • prodmat.csv - labor-augmenting productivity growth rates
  • aeeinew.csv - autonomous energy efficiency improvements

Additional files are likely to be present int the data folder. These files are used to configure the baseline projections and to calibrate various parameters, depending on the specific G-Cubed model version. Commonly, these additional files will include a baseline_design.csv file and a modprod.csv file and a product.csv file. Details of those files are provided in the documentation of the baseline projections.

The database file

The database file, database.csv, contains the model database. It has a row for each variable in the model and a value for each of the years covered by the database.

The first row of the data file contains the column headings. All other rows of the datafile contain data, with one row for each variable in the model. The variables have a strict ordering that must correspond to the ordering of the variables produced by the SYM processor from the SYM model definition. This ordering of variables can be found in the model_<VERSION>_<BUILD>_vars.csv file produced by the SYM processor.

Following the variables names, are descriptions, units of measurement, and the G-Cubed region code for each variable.

The input/output tables file

The single iotables.csv file contains the Input/Output (IO) tables for all regions. Each region has an IO table. The IO tables are stacked vertically in the IO tables file.

The first column of the IO table, with the <REGION_CODE> in the first cell, must be in the first column of the CSV file.

Each Input/Output table has the following structure:

<REGION_CODE> a01 a0N C I G X M
g01                
:                
g0N                
L                
K                
TAX                

The region code in the left corner is mandatory. It must be exactly the same as the region code used for the region in the SYM set of regions in the model definition. It is used to locate the Input/Output table for the region when the table is loaded.

The first row and the first column of the table are labels for the rows and columns respectively.

Columns of the table describe ‘uses’ by a particular sector or for:

  • Consumption - C
  • Investment - I
  • Government spending - G
  • Exports - X
  • Imports - M

The sector labels must be the sector identifiers in the set of sectors in the SYM model definition. Typically these are a01 for sector 1, a02 for sector 2 etc.

Rows of the table describe ‘inputs’ by type of sectoral good (service) produced or:

  • Labour - L
  • Capital - K
  • Tax - TAX

The goods labels must be the good identifiers in the set of goods in the SYM model definition. Typically these are g01 for sector 1 g02 for sector 2 etc.

the user-defined parameters file

The setparameters.csv file contains user-defined parameters. These can be altered by the user to change the model’s behaviour.

The user_parameters.csv file contains user-defined parameters. This is the subset of the parameters in the model that users are encouraged to consider adjusting. Note that there are other parameters in the model are calibrated using information from the model’s database and IO tables.

The first column in the file contains the name of the parameter.

There is then one additional column in the file for each region in the model. column label for each region’s column is the region identifer used in the SYM model definition.

The region columns must be in the same order as the regions are defined in the regions set in the SYM model definition.

A parameter value is required in each column, for each parameter listed in the file.

Some parameters are defined for different sectors as well as for different regions. These parameters have names that include the sector identifier, e.g. sigma_df(a01) for sigma_df for sector a01. For those parameters, the sectors must be in the same order as they are defined in the sectors set in the SYM model definition

Optionally, the last row of the file can contain the word end in the parameter name column, with no values in any of the other columns. This optional last row is ignored.

The population growth rates file

The population growth rates file, modpop.csv, records annual population growth rates data for all regions, expressed as percentages so a value of 1 is a 1% growth rate. It has a row for each region in the model.

The header row contains a year label in YYYY format for each year in the population projections, out to the last projection year as specified in the model configuration.

The row labels must be the region identifiers used in the SYM model definition and they must be in the same order as the regions are defined in the SYM model.

The productivity growth rates file

The productivity growth rates file, prodmat.csv, specifies the information needed to generate projections of labor-augumenting productivity growth rates.

When doing baseline projections with the model, the population and labour-augmenting productivity growth rate projections are combined into exogenous effective labour productivity growth rate projections.

The first row of the productivity growth rates file contains column labels. The first column label is region. The second column label is sector. The remaining column labels are the years from a year at or before the first projection year through to the last projection year.

The file contains:

  • productivity growth in each year of the projection for each sector of the US. A value of 1 implies a 1% simple annual growth rate.
  • For each non-US region, for each sector, specify the starting period fraction of US productivity for the same sector (a value of 1 implies the same productivity). This is only required in the initial period.
  • For each non-US region, for each sector, in each year of the projection, specify the catch-up rate as that non-US region’s sector catches up to the productivity of the same sector in the US. A value of 0.02 implies that the gap in productivity from the previous year declines by 2% to determine the new gap to US productivity. Note that this is not the same as the productivity growth rate information.

Each of these three elements are contained in the file, one after the other. Their beginning in the file is identified by a text heading in column 1 of the row before where they start.

The labels for the three sections of this file are:

  1. productivity growth
  2. sector ratio to the USA leader
  3. catchup rate

These labels are case sensitive, they must be in the first column and they are relied upon when loading the productivity data.

Productivity growth data for USA sectors

The first row of this section of data just contains the G-Cubed region identifier for the United States in the first column.

There is then one row for each sector in the SYM model definition (the sectors set): a01 and a02, if there are 2 such sectors. They must be in the same order as the sector set membership declaration in the SYM model definition.

For each sector row, the first column is blank and the second column is the sector identifier, e.g. a01. The row then contains a percentage growth rate for the sector in the column corresponding to each year column in the file.

Sector ratios to the United States

This section of the file has one table per non-United States region.

For each region, the table begins with a row that contains the G-Cubed region identifier in the first column and nothing else in any of the other columns.

There is then one row for each of the members of the sectors set in the SYM model definition, a01 through to a02, if there are 2 such sectors. They must be in the same order as the sector membership declaration in the SYM model definition.

For each sector’s row, the first column is blank and the second column is the sector identifier, e.g. a01. The third column is the productivity ratio to the same sector in the United States. Thus, a value of 0.5 would mean productivity for that region is half the productivity of the same sector for the United States. No other columns have values.

Catchup rates to the United States

This section of the file has one table per non-United States region, in the same order as the regions are declared in the regions set of the SYM model definition.

For each region, the table begins with a row that contains the region identifier in the first column and nothing else in any of the other columns.

There is then one row for each of the members of the sectors set in the SYM model definition, a01 through to a02, if there are 2 such sectors. They must be in the same order as the sector membership declaration in the SYM model definition.

For each sector’s row, the first column is blank and the second column is the sector identifier, e.g. a01. The row then contains a decimal catchup rate for the sector in the column corresponding to each year column in the file. Thus a value of 0.02 in a particular year means that 2% of the remaining gap in productivity is closed in that year.

The autonomous energy efficiency improvements file

The aeeinew.csv file records data on Autonomous Energy Efficiency Improvement (AEEI): exogenous improvements in the way energy contributes to production in each sector and to consumption.

See McKibbin and Wilcoxen (2013) A global approach to energy and environment: the G-Cubed model for details of AEEI.

The first row of the file contains column labels. The first column label is blank. The remaining column labels are the years from a year at or before the first projection year through to the last projection year.

The first column contains row labels. All row labels are prefixed by aeei. They are then an integer indicating the sector and then the region identifier.

The rows must order the sectors in the same way that they are ordered when declared in the SYM model definition. They must also order the regions in the same way that they are ordered when declared in the SYM model definition.

The values in the remaining cells for each region are the percentage exogenous improvement in energy efficiency for that sector in that region for that year.

The file also contains a set of rows for consumption energy efficiency. These are the last rows in the file. There is one such row for each region. The row identifiers for these rows also start with aeei, followed by lowercase c for consumption, and then the region identifier.