Model data files

Table of contents

The data subdirectory contains the data files for the model. All of the data files are in the CSV format.

The files are comma delimited and none of the values in those files are allowed to contain commas.

The database

The .csv contains the model database.

See the teaching model example.

It must be populated with data for all variables in the model for one or more years leading up to and including the first projection year. Data can also be included for years after the first projection year. Data after the first projection year will be ignored when running the model.

Data in this file is considered to be the observed historical data for all of the variables in the model.

The first row of the data file contains the column headings. All other rows of the datafile contain data, with one row for each variable in the model. The variables have a strict ordering that must correspond to the ordering of the variables produced by the SYM processor from the SYM model definition.

You can find the variables in the model, in this strict order, in the varmap file.

Following the variables names, are descriptions, units of measurement, and country/region code.

More detailed definitions of the variables are in

The input/output tables for all regions

The .csv contains the Input/Output tables for all regions.

Review the teaching model Input/Output tables file to get insight into how the data is organised.

Each region has an Input/Output table.

The Input/Output tables are stacked vertically in the Input/Output tables file.

The first column of the IO table, with the <REGION_CODE> in the first cell, must be in the first column of the CSV file.

Each Input/Output table has the following structure:

<REGION_CODE> a01 a0N C I G X M
g01                
:                
g0N                
L                
K                
TAX                

The region code in the left corner is mandatory. It must be exactly the same as the region code used for the region in the SYM set of regions in the model definition. It is used to locate the Input/Output table for the region when the table is loaded.

The first row and the first column of the table are labels for the rows and columns respectively.

Columns of the table describe ‘uses’ by a particular sector or for:

  • Consumption - C
  • Investment - I
  • Government spending - G
  • Exports - X
  • Imports - M

The sector labels must be the sector identifiers in the set of sectors in the SYM model definition. Typically these are a01 for sector 1 etc.

Rows of the table describe ‘inputs’ by type of sectoral good (service) produced or:

  • Labour - L
  • Capital - K
  • Tax - TAX

The goods labels must be the good identifiers in the set of goods in the SYM model definition. Typically these are g01 for sector 1 etc.

The values in the Input/Output tables are generated by Ox scripts that are beyond the scope of this documentation.

User-defined parameters

This file contains user-defined parameters.

Review the teaching version of the parameters file to get insight into how the data is organised.

For more information on what the model parameters are, review the related model definitions section.

The first column in the file contains the name of the parameter.

There is then one additional column in the file for each region in the model. column label for each region’s column is the region identifer used in the SYM model definition, e.g. UU for the United States.

The region columns must be in the same order as the regions are defined in the regions set in the SYM model definition, starting with the column for the United States, UU.

A parameter value is required in each regions column, for each named parameter.

Some parameters are defined for different sectors as well as for different regions. These parameters have names that include the sector identifier, e.g. sigma_df(a01) for sigma_df for sector a01. For those parameters, the sectors must be in the same order as they are defined in the sectors set in the SYM model definition

Optionally, the last row of the file can contain the word end in the parameter name column, with no values in any of the other columns. This optional last row is ignored.

The population growth rates file

The population growth rates file records annual population growth rates data for all regions, expressed as percentages so a value of 1 is a 1% growth rate.

Review the teaching version of the population file to get insight into how the data is organised.

This data is generated from the GTAP database and users usually do not need to change the values.

The row labels should be the region identifiers from the SYM model definition. These might need to be modified from the identifiers used in the Ox versions of this file because the region identifiers are prefixed by pop in the Ox versions of this file.

The productivity growth rates file

The productivity growth rates file specifies productivity growth rates for all years in the projection. When doing baseline projections, the population and productivity growth rate projections are combined into exogenous effective labour productivity (AL) growth rate projections.

Review the teaching version of the productivity file to get insight into how the data is organised.

The first row of the file contains column labels. The first column label is region. The second column label is sector. The remaining column labels are the years from a year at or before the first projection year through to the last projection year.

The file contains:

  • productivity growth in each year of the projection for each sector of the US. A value of 1 implies a 1% simple annual growth rate.

  • For each non-US region, for each sector, specify the starting period fraction of US productivity for the same sector (a value of 1 implies the same productivity). This is only required in the initial period.

  • For each non-US region, for each sector, in each year of the projection, specify the catch-up rate as that non-US region’s sector catches up to the productivity of the same sector in the US. A value of 0.02 implies that the gap in productivity from the previous year declines by 2% to determine the new gap to US productivity. Note that this is not the same as the productivity growth rate information.

Each of these three elements are contained in the file, one after the other. Their beginning in the file is identified by a text heading in column 1 of the row before where they start.

The labels for the three sections of this file are:

  1. productivity growth
  2. sector ratio to the USA leader
  3. catchup rate

These labels are case sensitive, they must be in the first column and they are relied upon when loading the productivity data.

Productivity growth data for USA sectors

The first row of this section of data just contains the region identifier for the United States in the first column, UU.

There is then one row for each of the members of the sectors set in the SYM model definition, a01 through to a02, if there are 2 such sectors. They must be in the same order as the sector membership declaration in the SYM model definition.

For each sector’s row, the first column is blank and the second column is the sector identifier, e.g. a01. The row then contains a percentage growth rate for the sector in the column corresponding to each year column in the file.

Sector ratios to the United States

This section of the file has one table per non-United States region.

For each region, the table begins with a row that contains the region identifier, e.g. NN, in the first column and nothing else in any of the other columns.

There is then one row for each of the members of the sectors set in the SYM model definition, a01 through to a02, if there are 2 such sectors. They must be in the same order as the sector membership declaration in the SYM model definition.

For each sector’s row, the first column is blank and the second column is the sector identifier, e.g. a01. The third column is the productivity ratio to the same sector in the United States. Thus, a value of 0.5 would mean productivity for that region is half the productivity of the same sector for the United States. No other columns have values.

Catchup rates to the United States

This section of the file has one table per non-United States region.

For each region, the table begins with a row that contains the region identifier, e.g. NN, in the first column and nothing else in any of the other columns.

There is then one row for each of the members of the sectors set in the SYM model definition, a01 through to a02, if there are 2 such sectors. They must be in the same order as the sector membership declaration in the SYM model definition.

For each sector’s row, the first column is blank and the second column is the sector identifier, e.g. a01. The row then contains a decimal catchup rate for the sector in the column corresponding to each year column in the file. Thus a value of 0.02 in a particular year means that 2% of the remaining gap in productivity is closed in that year.

Autonomous energy efficiency improvements file

Review the teaching version of the productivity file to get insight into how the data is organised.

The aeeinew.csv file records data on Autonomous Energy Efficiency Improvement (AEEI): exogenous improvements in the way energy contributes to production in each sector and to consumption.

See McKibbin and Wilcoxen (2013) A global approach to energy and environment: the G-Cubed model for details of AEEI.

The first row of the file contains column labels. The first column label is blank. The remaining column labels are the years from a year at or before the first projection year through to the last projection year.

The first column contains row labels. All row labels are prefixed by aeei. They are then an integer indicating the sector and then the region identifier.

For example, the row labelled aeei12UU is the AEEI projections for sector 12 for the United States.

The rows must order the sectors in the same way that they are ordered when declared in the SYM model definition. They must also order the regions in the same way that they are ordered when declared in the SYM model definition.

The values in the remaining cells for each region are the percentage exogenous improvement in energy efficiency for that sector in that region for that year.

The file also contains a set of rows for consumption energy efficiency. These are the last rows in the file. There is one such row for each region. The row identifiers for these rows also start with aeei, followed by lowercase c for consumption, and then the region identifier, e.g. UU.