Historic Mortality Datasets
Historic Mortality Datasets
The Historic Mortality Data Files database was originally created as a basic tool for researchers studying mortality in England and Wales. These two datasets reflect different versions of the database. The first dataset covers 1901-1992, and reflects the Historic Mortality Data Files database before it was redesigned in 1997. The second dataset covers 1901-1995, and is the version which was re-issued by ONS in 1997 as Twentieth Century Mortality Files. There is a significant overlap between them, and many similarities as well as differences which are detailed below.
The 1901-1992 and 1901-1995 datasets contain the following types of tables:
The Historic Deaths tables record the number of deaths in England and Wales in each year broken down by age group, sex and the underlying cause of death. From 1911 onwards, the cause of death is coded according to the contemporary version of the International Classification of Diseases (ICD). For the period 1901-1910, causes of death follow a classification scheme which was used in England and Wales before the ICD was adopted. Each dataset thus contains an Historic Deaths table for 1901-1910, and a table for each period in which a different revision of the ICD was in force. Down to 1992, the data relates to deaths which were registered in the year in question; from 1993 onwards, the figures represent deaths which occurred during the year.
Each dataset also contains a single Population table which contains estimates of the population of England and Wales (the 'population at risk of dying') by year, by sex, and by age groups. The age groups correspond to the age groups used in the Historic Deaths tables.
Finally, the second dataset includes ICD Dictionary tables which explain the codes used for causes of death in the Historic Deaths tables. There is one ICD Dictionary table for each Historic Deaths table. These tables are not present in the 1901-1992 dataset, in which codes for causes of death were not explained except in the accompanying documentation.
ICD Codes: The ICD originated in a draft nomenclature of causes of death which was presented to a session of the International Statistical Institute by Dr Jacques Bertillon in 1893. The first revision of the ICD was adopted at an international conference in 1900. New versions have been issued at roughly 10-year intervals. Maintenance of the standard rests with the WHO. In England and Wales the ICD was first adopted in 1911, in the form of an amended version of the second revision. Nine revisions of the ICD are accounted for in these datasets and the adoption of each roughly corresponds with the commencement of each decade of the 20th Century.
During the period 1901-1910, causes of death in England and Wales were classified by the General Register Office using a list of causes which was a variant of the first revision of the ICD, but did not employ ICD codes. When the Historic Mortality Data Files database was developed by Office of Population Censuses and Surveys (OPCS), codes were assigned to causes in this unnumbered list. This is the basis for the codes for causes of death in the Historic Deaths table for 1901-1910, in both datasets. The other Historic Deaths tables in the datasets cover the periods of the second through to the ninth revisions of the ICD.
In the 1901-1992 dataset, ICD codes are represented by 'computer codes', which can differ substantially from the ICD codes. This is particularly true in the case of ICD revisions 2-5, for which the alphanumeric ICD codes were converted into purely numeric codes in the dataset. Explanations of the computer codes and the ICD codes were not included in the dataset. However, the documentation accompanying the dataset allowed for the matching up of computer codes and ICD codes, and explained the meaning of the ICD codes for ICD revisions 2-5 and the codes used in the period 1901-1910. By contrast, the 1901-1995 dataset includes actual ICD codes in the Historic Deaths tables from 1911 onwards, with explanations of the codes being provided in the ICD Dictionary tables. The most significant difference is that where the ICD employed a 3-digit code, a '0' was added at the end in the dataset to ensure that all codes had 4 digits.
Data on causes of death recorded in the Historic Deaths tables represents data on the underlying causes of death, and is ultimately derived from the system for registering deaths for civil purposes. 'Underlying cause of death' was defined in the 9th revision of the ICD as (1) the disease or injury that initiated the train of events leading to death, or (2) the circumstances of the accident or violence (e.g. suicide) that produced the fatal injury. Where death was not due to natural causes, ICD revisions 6-9 allowed two codes to be assigned to each death: one covers the external cause of injury and the other the nature of the injury. To avoid any double counting of deaths, only counts for external causes of injury are included in the Historic Deaths tables for these revisions, in both datasets.
Age Groups: In both datasets, data in the Historic Deaths tables and the Population table is divided into standard age groups. These age groups vary according to the period covered by the data. In most cases, five-year age groups from age 5 up to age 85+ are used. However, there are variations from this for some of the periods corresponding to the earlier ICD revisions.
From 1986 onwards, data in the Historic Deaths tables for deaths under the age of 1 excludes deaths in the first 28 days of life. This resulted from the introduction of a new form of death certificate for stillbirths and neonatal deaths in that year, which abandoned the concept of an underlying cause of death. Instead, physicians were required to supply details of maternal and foetal contributions to mortality.
In both datasets, down to 1992, the years assigned to data in the Historic Deaths tables represent the year when the death was registered. In 1993 OPCS began publishing mortality statistics by the year in which the death occurred, rather than by the year in which the death was registered. This affects data in the Historic Deaths tables in the 1901-1995 dataset, where the year represents the year of registration up to 1992, and the year of occurrence of death for 1993-1995.
The datasets in this series are available to download. Links to individual datasets can be found at piece level.
Hardware: In 1985 the data was held by the Office of Population Censuses and Surveys (OPCS) on an IBM mainframe and an ICL 2900 series mainframe. For most of this period the original processing of mortality data and other vital statistics data was done on an IBM 1401 mainframe, which came into use in 1963 and was still being used in 1969.
Operating system: VME; The software formats in which the 1997 version was supplied to purchasers would normally have presupposed a DOS or Windows type operating system.
Application software: OPCS held the 1901-1992 dataset as ASCII text files.
Logical structure and schema: Both datasets contain Historic Deaths tables for each ICD revision (and the period 1901-1910), as well as a single Population table. In addition, the 1901-1995 dataset includes ICD Dictionary tables corresponding to each Historic Deaths table, which explain the codes used in the Historic Deaths tables for causes of death.
Data sources: The data in the datasets is taken from a mixture of published and unpublished sources. The information in all of these sources was originally gathered for purposes other than the compilation of the database. In both the 1901-1992 and the 1901-1995 datasets, data in the Population tables was taken from mid-year population estimates which were periodically issued by OPCS and its predecessors through several series of publications. These included the OPCS Monitor Series, the Registrar General's Quarterly Return for England and Wales, the Registrar General's Statistical Review, the Registrar General's Decennial Supplements and the 73rd Report of the Registrar General (1910). The population estimates also reflect periodic revisions made in light of data from the decennial Census.
Deaths data in the Historic Deaths tables for the period 1901-1958 was derived from published sources. Information on the numbers and causes of deaths in these years was manually transcribed from tables published annually in the Registrar General's Statistical Review. From 1959 onwards mortality data was available in electronic formats. For the period 1959-1967, these took the form of archived computer tapes of data on individual deaths, which had been used in the production of annual reference volumes. For deaths data after 1967, the compilers of the database were able to use computer summaries of mortality data which had already been created for routine tabulation purposes.
Original validation: The data in the Historic Deaths tables was systematically checked against the sources used to compile the data. For data from the period 1901-1958, this involved summing the data on computer by age group for each cause of death, and by cause of death for each age group, and checking the results against the published sources. Where the process of transcribing the data detected printing errors in the published sources, the data in the database was adjusted to achieve consistency rather than to agree with the incorrect published figures. It was also noted that 265 deaths in a colliery disaster in 1934 had not been registered until 1938-39. These deaths were allocated to 1934 in the database. Data for 1959-1967 was checked against published figures, to detect any errors arising from the corruption and loss of data on the archived computer tapes. For data from 1968 onwards, checking involved making sure that every ICD cause of death had been carried across.
Data coding: The creation of the 1901-1992 dataset involved converting ICD codes for causes of death into computer codes, which are substantially different in some cases from ICD codes. These considerations do not affect the 1901-1995 dataset, in which the actual ICD codes are reproduced with relatively minor modifications.
Constraints arising from published sources: These constraints affect data in the Historic Deaths tables for 1901-1958, in both datasets. The age groups into which the data is divided were determined by the age groups used in the Registrar General's Statistical Review in this period. These age groups, in turn, are the basis of the age groups used in the Population tables covering the period for 1901-1958. Similarly, data on causes of death is limited to those causes which were reported in the Registrar General's Statistical Review. In other words, where no incidences of a particular cause of death were recorded for a particular year, that cause will not appear in the datasets.
General constraints affecting mortality data: Regardless of whether published or unpublished sources were the immediate source of data in the Historic Deaths tables, the data was ultimately derived from the system for registering deaths in England and Wales. Data in the Historic Deaths tables was affected by changes in the methods of certifying deaths and of identifying the underlying cause of death.
Medical certificates and coroner's certificates were transmitted periodically by superintendent registrars to the Registrar General. The format of medical certificates changed over time. The most radical change occurred in 1927, when a two part medical certificate was introduced: in the first part the doctor recorded the disease or condition leading directly to death and causes antecedent to it, while the second part was reserved for"other significant conditions contributing to the death, but not related to the disease or condition causing it". From 1940 onwards the entry in the first part of the certificate was taken to be the underlying cause of death.
Further changes to the methods of identifying underlying cause of death occurred in 1984. OPCS adopted a broader interpretation than that previously used of a WHO coding rule, that when the cause of death in the first part of the death certificate was a direct sequel to a condition mentioned in the second part, the latter condition should be preferred as the underlying cause of death. This resulted in an artificial decrease in the numbers of deaths from certain causes (e.g. bronchopneumonia) and corresponding increases in other causes. The anomaly was reversed in 1993, when an overhaul of OPCS's computer systems led to the introduction of an automated system for coding cause of death which followed the internationally agreed interpretation of the WHO's rules for selecting underlying cause.
Two other factors affect mortality data from specific periods:
|Held by:||The National Archives, Kew|
|Copies held at:||The 1901-1992 dataset is also held in the UK Data Archive (http://www.data-archive.ac.uk/), where it is known by the title 'Historic Mortality and Population Data, 1901-1992' (study number 2902).|
|Former reference in The National Archives||CRDA/20|
|Legal status:||Public Record(s)|
General Register Office, 1836-1970
Office for National Statistics, 1996-
Office of Population Censuses and Surveys, 1970-1996
|Physical description:||3 datasets and documentation|
|Restriction on use:||The Historic Mortality Data Files datasets are subject to Crown Copyright; copies may be made for private study and research purposes only.|
|Immediate source of acquisition:||In 2010 the United Kingdom National Digital Archive of Datasets|
|Custodial history:||Originally transferred from the Office for National Statistics (ONS) from 1998. The United Kingdom National Digital Archive of Datasets (NDAD) then held the datasets until 2010 when they were transferred to The National Archives (TNA).|
|Accruals:||Further accruals are not anticipated.|
|Publication note:||'Twentieth Century Mortality Trends in England and Wales' by researchers at the Office for National Statistics, based on analysis of the 1901-2000 version of the database|
|Unpublished finding aids:||Extent of documentation: 6 documents, Dates of creation of documentation: c 1979-1997|
|Administrative / biographical background:||
The Historic Mortality Data Files database is believed to have originated in the Medical Statistics Division of the Office of Population Censuses and Surveys (OPCS) as a basic tool for researchers studying mortality in England and Wales. The two versions of the database which have been transferred record the numbers of deaths registered in England and Wales from 1901 onwards by year, sex, age group and underlying cause of death. They also provide estimates of the population at risk of dying by year, sex and comparable age groups. The data was designed to allow for the calculation of national mortality rates, the analysis of trends in mortality and differences in mortality by age and sex, epidemiological research, and local studies of mortality which required national rates for comparative purposes.
The creation of the database was prompted by the creation by the World Health Organisation (WHO), in the mid 1970s, of a database of death rates from 1950 onwards for a number of countries by year, sex and a limited range of causes. This led other institutions to construct similar but more detailed databases to allow for the computer analysis of their own national mortality data. In 1979 OPCS decided to construct a mortality database for England and Wales using readily available sources in published form and on computer. The database was updated annually by OPCS and its successor, the Office for National Statistics (ONS), to include additional years of data. It was distributed commercially to outside purchasers from at least 1985 onwards.
In 1997 the database was redesigned by ONS, and a new version was issued to the public on CD-ROM under the title 'Twentieth Century Mortality Files'. This included data down to 1995, and had a number of features which distinguished it from the earlier version of the database. After 1997, annual update CDs were issued by ONS covering data from 1996 until 1999. In 2003 ONS issued a revision of Twentieth Century Mortality Files covering data from 1901 to 2000, which incorporated revised population estimates based on the 2001 Census.