Model data files

Table of contents

The data folder contains the data files for the model. All of the data files are in the CSV format.

The mains files are:

  • database.csv - the database file
  • iotables.csv - the input/output tables for all regions
  • user_parameters.csv - user-defined parameters
  • labor_force_growth_rates.csv - labor-force growth rates out to the last projection year.
  • technology_advancement_rates.csv - the rate of advancement of the technological frontier in each sector in each year out to the last projection year.
  • technology_gaps.csv - The gap behind the technological frontier for each sector in each region
  • technology_catchup_rates.csv - The rate at which the gap from the technological frontier is closed for each sector in each region in each year.
  • autonomous_energy_efficiency_improvements.csv - autonomous energy efficiency improvements

Additional files are likely to be present int the data folder. These files are used to configure the baseline projections and to calibrate various parameters, depending on the specific G-Cubed model version. Commonly, these additional files will include a baseline_design.csv file and a labor_augmenting_technical_change.csv file and a baseline_exogenous_projections.csv file. Details of those files are provided in the baseline projections explanation.

The database file

The database file, database.csv, contains the model database. It has a row for each variable in the model and a value for each of the years covered by the database.

The first row of the data file contains the column headings. All other rows of the datafile contain data, with one row for each variable in the model. The variables have a strict ordering that must correspond to the ordering of the variables produced by the SYM processor from the SYM model definition. This ordering of variables can be found in the model_<VERSION>_<BUILD>_varmap.csv file produced by the SIM processor.

Following the variables names, are descriptions, units of measurement, and the G-Cubed region code for each variable.

The input/output tables file

The single iotables.csv file contains the Input/Output (IO) tables for all regions. Each region has an IO table. The IO tables are stacked vertically in the IO tables file.

The first column of the IO table, with the <REGION_CODE> in the first cell, must be in the first column of the CSV file.

Each Input/Output table has the following structure:

<REGION_CODE> a01 a0N C I G X M
g01                
:                
g0N                
L                
K                
TAX                

The region code in the left corner is mandatory. It must be exactly the same as the region code used for the region in the SYM set of regions in the model definition. It is used to locate the Input/Output table for the region when the table is loaded.

The first row and the first column of the table are labels for the rows and columns respectively.

Columns of the table describe ‘uses’ by a particular sector or for:

  • Consumption - C
  • Investment - I
  • Government spending - G
  • Exports - X
  • Imports - M

The sector labels must be the sector identifiers in the set of sectors in the SYM model definition. Typically these are a01 for sector 1, a02 for sector 2 etc.

Rows of the table describe ‘inputs’ by type of sectoral good (service) produced or:

  • Labour - L
  • Capital - K
  • Tax - TAX

The goods labels must be the good identifiers in the set of goods in the SYM model definition. Typically these are g01 for sector 1 g02 for sector 2 etc.

the user-defined parameters file

The user_parameters.csv file contains user-defined parameters. This is the subset of the parameters in the model that users are encouraged to consider adjusting. Note that there are other parameters in the model are calibrated using information from the model’s database and IO tables.

The first column in the file contains the name of the parameter.

There is then one additional column in the file for each region in the model. column label for each region’s column is the region identifer used in the SYM model definition.

The region columns must be in the same order as the regions are defined in the regions set in the SYM model definition.

A parameter value is required in each column, for each parameter listed in the file.

Some parameters are defined for different sectors as well as for different regions. These parameters have names that include the sector identifier, e.g. sigma_df(a01) for sigma_df for sector a01. For those parameters, the sectors must be in the same order as they are defined in the sectors set in the SYM model definition

Optionally, the last row of the file can contain the word end in the parameter name column, with no values in any of the other columns. This optional last row is ignored.

The labor-force growth rates file

The labor-force growth rates file, labor_force_growth_rates.csv, records annual population growth rates data for all regions. It has a row for each region in the model.

The labor force growth rates file records annual labor force growth rates data for all regions, expressed as percentages so a value of 1 is a 1% growth rate.

The CSV file format for the 2R model is shown below:

  2018 2150
USA 1   0
ROW 2   0

The first row contains the projection year column labels in columns 2 onward in YYYY format. The following rows contain the labor force growth rates for each region. Each row of data has the SYM region code in the first column and the percentage growth rate for each year in the column corresponding to that year.

In the example above, the first projection year is 2018 and the last projection year is 2150. The USA labor force grows at 1% in 2018 and 0% in 2150. The ROW region labor force grows at 2% in 2018 and 0% in 2150.

The row labels should be the region identifiers from the SYM model definition. They must be in the same order as the regions are defined in the SYM model definition of the regions set.

The labor productivity growth rate projections file

The data is available in 3 separate CSV files.

The technology advancement rate projections file

technology_advancement_rates.csv is a CSV file that contains the rate of advancement of the technological frontier in each sector in each year out to the last projection year. It provides information about the advancement rates for technology in each sector, through all projection years. Values are expressed as a percentage so a value of 2.0 means that the technology will advance by 2% in the associated year. The data is stored with sectors for rows and projection years for columns. The row labels are the SYM sector codes. The columns are the projection years in YYYY format out to the last projection year.

For example, for the 2R model:

sector 2018 2150
a01 1.4 1.4
a02 1.4 1.4

The technology gaps file

technology_gaps.csv is a CSV file that documents the gap behind the technological frontier for each sector in each region. The information about the technology gaps in each region for each sector is expressed as a percentage. Thus, a value of 50 means the region has a sector that is 50% as efficient as is possible in the first projection year. The data is stored with sector rows and region columns. The row labels are the SYM sector codes. The column labels are the SYM region codes.

For example, for the 2R model:

sector USA ROW
a01 90 100
a02 100 90

Note that the maximum value is 100 and the minimum value must be positive.

It is not mandatory, but it is typical that at least one region is on the technology frontier, with a sector value of 100.

The catchup rate projections file

technology_catchup_rates.csvis a CSV file that documents the rate at which the gap from the technological frontier is closed for each sector in each region in each year. It sets out the catchup rates for technology in each sector, through all projection years. Values are expressed as a percentage so a value of 2.0 means that the technology gap will close by 2% in the associated year. The data is stored with region in the first column and sector in the second column. The columns are the projection years in YYYY format out to the last projection year.

For example, for the 2R model:

region sector 2018 2150
USA a01 2 2
USA a02 2 2
ROW a01 2 2
ROW a02 2 2

Note that the maximum value is 100 and the minimum value must be great than -100.

The autonomous energy efficiency improvements file

The autonomous_energy_efficiency_improvements.csv file records data on Autonomous Energy Efficiency Improvement (AEEI): exogenous improvements in the way energy contributes to production in each sector and to consumption.

See McKibbin and Wilcoxen (2013) A global approach to energy and environment: the G-Cubed model for details of AEEI.

The AEEI CSV file records Autonomous Energy Efficiency Improvements for all regions and all sectors within each region, through the projection years. It also captures these improvements for consumption.

See McKibbin and Wilcoxen (2013) A global approach to energy and environment: the G-Cubed model for details of AEEI.

An example layout for this CSV file is shown below for the 2 region/2 sector model:

  2017 2018 2019 …. 2150
AEEI(a01,USA) 0 0 0 0 0
AEEI(a01,USA) 0 0 0 0 0
AEEI(a01,ROW) 0 0 0 0 0
AEEI(a01,ROW) 0 0 0 0 0
AEEIC(USA) 0 0 0 0 0
AEEIC(ROW) 0 0 0 0 0

The first row contains the ordered years as column labels.

Note that the first year can be before the first projection year.

The last year must be the last projection year.

The first column contains row labels. All row labels are made up from three components, in the following order:

  1. The prefix AEEI for production by sectors and AEEIC for consumption.
  2. In brackets, the set combinations that are affected by the autonomous energy efficiency improvements. For sectors, the set combinations are a sector code followed by a region code. For consumption, the set combination is just a region code.

For example, the row labelled AEEI(a01,USA) is the AEEI projections for sector 1 for the United States.

The consumption rows must be the last rows in the file.

The sector rows must be in the SYM-defined sector order.

The rows must also be in the SYM-defined region order as you work down the file.

The data values are the percentage exogenous improvement in energy efficiency for that sector, or for consumption, in the given region for a given year. Thus, a value of 1 is a 1% improvement in energy efficiency in that year.