3Data sample

3.1Overview

The reliable and valid measurement of data plays an essential role for the quality of the results of a statistical analysis as described by Tabachnick and Fidell (2009). The quantitative approach of the current investigation is based on empirical data describing a total sample of 253 operated facilities. In a cooperation with 25 project partners, the employed data was collected in the years 2008 until 2014 in Germany. The project partners are mainly public sector institutions as for example municipalities, universities, ecclesiastical administrations, social housing administrations, and social associations. The project partners provided data of up to 80 facilities with buildings constructed between the years 1370 and 2010. In Figure 3.1, the locations of the analysed facilities are illustrated. The data collection was primarily restricted to the south west area of Germany with a large amount of facilities located in the city of Stuttgart.

Figure 3.1. Locations of the analysed facilities

As described by Hox and Boeije (2005), a data collection can generally be differentiated into two categories: Primary data is collected specifically for the purpose of a research project or a study where secondary data was originally collected for a different purpose as for example administrative reasons. The statistical analysis of the current investigation is based on a data collection process that was conducted on both a primary and a secondary level. The secondary data was obtained by the project partners and includes cost data, quantities, and schematic floor plans of the facilities. The cost data was submitted in digital form as for example spreadsheets and contained usually all cash flows related to a facility for an accounting period of between one and five years. Based on the descriptions of the cash flows, the cost data was classified according to the cost structure of the standard DIN 18960:2008-02 as described in Section 2.2. All cost data are adjusted to first quarter 2016 prices and include the current German VAT. Besides the cost data, the project partner provided usually spreadsheets of various quantities. On the basis of the obtained schematic floor plans of the facilities, the quantity data is classified according to the structure of the standards DIN 277-1:2016-01 and DIN 277-3:2005-04. Consequently, all received secondary data is processed in order to provide a consistent data base for the implementation of the current investigation.

In a primary data collection, further information was collected on-site the facilities using a structured and standardised questionnaire. Interviews with the responsible facility managers or owners were conducted during the visits of the facilities. The data collected on-site include detailed information on characteristics, conditions, standards, utilisations, locations, and management strategies of the facilities. Information related to the building construction, building services, and the site area were collected in accordance with the structure of the standard DIN 276-1:2008-12. Based on the detailed level of the collected data, information can be aggregated for the specific requirements of a cost group to be analysed. As basis for the empirical investigation of operating costs, the response and candidate predictor variables are presented in detail including descriptive statistics in the following Sections. Furthermore, the representativeness of the data sample is discussed critically and conclusions about the applicability and restrictions of the results of the current investigation are drawn.

3.2Presentation of the sample

3.2.1Response variables

As presented in Section 2.2, the annual operating costs are defined as the response variables of the current investigation. All cost data are obtained as secondary data from project partners and are classified in accordance with the cost structure of the standard DIN 18960:2008-02. The cost structure of the standard contains first, second, and third level cost groups as illustrated in Table 2.1 of Section 2.2.1 where the available cost data of the current investigation are classified into 15 different cost groups. On the most detailed third level of the cost structure, cost data can only be assigned to the sub-groups of the cost groups CG 310 for utility costs and CG 350 for operation, inspection and maintenance costs. Due to the limited extent of the cost descriptions and the structure of the provided cash flows of the facilities, further cost data can only be assigned to the second level cost groups of the standard. The cost data of the first level cost group CG 300 (operating costs) are aggregated from the respective second level cost groups. The data included in the investigation contain the current German value added tax rate and are adjusted to first quarter 2016 prices based on the figures of the German Federal Statistical Office (DESTATIS, 2017a,b) for consumer prices of construction works and the maintenance of buildings.

Initially, the cost data sample of the current investigation contains 253 observations in total. In the course of a pre-analysis of the data, multiple filters are applied on the data sample. As a result, the size of the data sample employed for the analysis varies for the respective cost groups between a number of 65 and 244 observations. Individual observations of the total sample are excluded from the data basis of the respective cost groups due to the limited availability of data. The analysis of cost groups with incomplete or missing cost data values may distort the reliability of the results of the developed statistical models. The data sample is therefore filtered according to the level of available cost data as essential information. Detailed reasons for the exclusion of observations are described in detail in the theoretical basis of the analysis of the particular cost groups in Chapter 4. On the basis of descriptive statistics and an analytical and visual inspection of the distribution of the cost data and respective cost indicators (e.g. histograms, box plots), errors and measurement mistakes in the data sample are identified as suggested by Tabachnick and Fidell (2009).

Table 3.1. Operating cost indicators (per m2GEFA)

1st quarter 2016 prices including VAT.

Figure 3.2. Box plots of operating cost indicators (per m2GEFA)

The quality of the current cost data is verified by a comparison with various publications providing annual operating cost information. Annually updated statistical operating cost data on the basis of currently 337 office buildings are published in the Office Service Charge Analysis Report by JLL (2016). The publication takes building sizes, standards, building characteristics, and locations for a classification of cost indicators into account. A comparison with the published data indicates plausibility for the operating cost data sample employed in the current investigation. Further verifications are carried out employing the data provided for example in the Evaluation System for Sustainable Building BNB by the BMVBS (2013) and the BMUB (2015). A comparison with operating cost indicators provided in the publications by BCIS (2007b) and by Rotermund (2016) verifies likewise the quality of the current sample.

The annual operating costs employed as response variables in the current investigation are presented in Table 3.1. The gross external floor area GEFA is used as reference quantity for the compilation of operating cost indicators. Mean values, standard deviations, lower quartiles, median values, and upper quartiles describe the data of all analysed cost groups of DIN 18960:2008-02. The size of the respective data sample is presented by the number of observations n. Accordingly, Figure 3.2 illustrates the data distribution with box plots. Further descriptive statistics of the annual operating cost data employed as response variables in the current investigation are illustrated in the Appendix. The absolute costs of all analysed 15 cost groups according to DIN 18960:2008-02 and corresponding box plots are presented in Table A.1 and Figure A.1, respectively. A description of the distribution of the operating costs amongst the cost groups as percentage is displayed in Table A.2 and Table A.3 and as box plots in Figure A.2 and Figure A.3 of the Appendix. Furthermore, cost indicators of the underlying cost data employing various available areas as reference quantity are presented in Table A.4 and Figure A.4.

3.2.2Predictor variables

In order to conduct a statistical investigation with empirical data as a basis, variables potentially explaining or predicting the response variables have to be selected as described by Chatterjee and Hadi (2006). As presented in detail in Section 2.2.2, various variable groups with variables potentially influencing the operating costs are selected by a review of literature. Specific areas, the compactness, the function, the condition, the standard, the utilisation, the location, and the management strategy are defined as relevant variable groups in the current investigation. Furthermore, various quantities are selected as candidate reference units for the introduction of adequate operating cost indicators.

The candidate reference quantities, the specific areas, and information on the compactness of the facilities were obtained as secondary data from the project partners. On the basis of spreadsheets and schematic floor plans, the data were classified according to the measurement rules and structures provided in the standards DIN 277-1:2016-01 and DIN 277-3:2005-04 in order to ensure a consistent data base for the investigation. The reference quantities employ the respective area in m2 or volume in m3 as unit where the specific areas refer to the gross internal floor area GIFA as a percentage. Descriptive statistics of the candidate reference quantities, the specific areas, and the compactness are presented in Table A.5, Table A.6, and Table A.7 in the Appendix for the total sample of 253 observations. The variables of the variable groups function, condition, standard, utilisation, location, and management strategy were collected as primary data on-site the facilities using a standardised questionnaire and in interviews with the responsible facility managers or owners. The variable group function contains for example variables describing the number of elevator stops and the number of sanitary facilities and is presented by descriptive statistics in Table A.8 in the Appendix of the study for all 253 observations.

The condition was assessed on-site the facilities on a precise level for all appraisable components of the building construction, building services, outdoor facilities, and furniture and equipments according to the standard DIN 276-1:2008-12. Based on the detailed level of the collected data, information is aggregated for the specific requirements of a cost group to be analysed. The aggregation of the condition is conducted under consideration of the respective construction costs for a component by a weighting according to the respective share of costs. Furthermore, the utilisation of a facility is considered in the weighting. For example, the share of defective building envelope as a candidate predictor variable on heating costs is aggregated from the condition of the building components base plate, external walls, and roofs. The shares of the defective base plate, external walls, and roofs in percent are therefore weighted by the respective construction costs of the components under consideration of the respective utilisation of the facility as published by BKI (2016) and aggregated to the variable share of defective building envelope in percent. The underlying components of the various aggregated conditions are presented in detail in the theoretical bases for the analyses of the respective cost groups in Chapter 4. Descriptive statistics of the conditions employed as candidate predictor variables are provided in Table A.9 in the Appendix.

Qualitative information on the standard was collected as primary data on-site the facilities employing the standardised questionnaire. Therefore, the standard of the construction, technical installations, and outdoor facilities, grounds, furniture and equipment were assessed. The heat storage capacity of the structure is considered by the candidate predictor variable thermal mass and divides the total sample into facilities with light thermal mass and heavy thermal mass. The significance of the existence of conservation regulations for the entire facility or parts of the facility is examined by the qualitative variable protected structure. The flexibility of the construction and the technical installations comprise the respective building components as defined in the standard DIN 276-1:2008-12 and give information about the variability of the infrastructure in case of a structural modification. The qualitative variables standard of the technical installations, heating system, building automation, outdoor facilities, and furniture and equipment contain information on the fulfilment of the usage requirements of the respective components and are included with the characteristics high or low. The qualitative variable outdoor facilities included describes the availability of information on outdoor facilities. Further information on the standard of the technical installations is given by the variable type of heating energy source characterising the heating system of the facilities. Descriptive statistics of the variables including detailed information on the characteristics are presented in the Appendix in Table A.10.

Table 3.2 illustrates the wide variety of utilisations considered in the current investigation. The data sample with a total number of 253 observations is presented with a differentiation into the available types of facilities as qualitative candidate predictor of the variable group utilisation. The differentiation into the facility types is conducted according to the Catalogue for the Classification of Civil Works by Argebau (2010). Table 3.2 contains the number of observations for the respective characteristics and the share on the total number of observations in percent. Further information on the utilisation of the facilities is provided by the qualitative candidate predictor variables specific utilisation and type of water usage as presented in Table A.12 of the Appendix including detailed information on the available characteristics. The variable group location includes the qualitative variables urban location and type of topography and provides information on the surrounding area of the facilities and the topography of the site areas, respectively. Both variables and the available characteristics including their statistical distribution are described in the Appendix in Table A.11. Furthermore, the significance of the management strategy is investigated by the qualitative variable type of cleaning services as presented in Table A.13 of the Appendix.

Table 3.2. Qualitative candidate predictor variable type of facility

[a]Total number of observations: 253.

3.3Test sample

As described by Fellows and Liu (2015), data used to develop a model can not be used to validate the model. The validation of a model must be conducted with data not involved in model development or the validation of the model may be distorted. According to Snee (1977), the splitting of a data sample for a cross-validation is an appropriate method to compare the fit of a model to the data and to measure the estimation accuracy. Therefore, the data sample of the current study is divided into two sub-samples as indicated in Section 2.3.6. A training sample is used to develop the statistical models and to introduce categorised cost indicators for the purpose of operating cost estimation. The training sample consists of approximately 90 %of the total observations. A test sample of approximately 10% of the total observations is solely used for the purpose of performance validation. The observations included in the test sample are selected randomly and shall be representative for the total sample of the current investigation. The observations of the test sample are not included in the development of the statistical models or used for the introduction of categorised cost indicators. Consequently, the validation of the performance can be conducted under independent and unbiased conditions.

Table 3.3. Comparison of the total, test, and training samples

[a]Percentage of the test sample on the total sample.

[b]Includes the observation employed as implementation example in Chapter 6.

A comparison of the randomly selected training and test samples for model development and validation is presented in Table 3.3 under consideration of the respective types of facility. With a number of 24 observations, the test sample consists of 9.5% of the total sample with 253 observations. For the presented types of facilities, the test sample includes a share of between 7.7% and 14.3% of the total observations. Since a limited number of observations is available for community halls, fire departments, and libraries, the respective types of facilities are not represented in the test sample. Nevertheless, the relatively consistent distribution of the observations regarding their utilisation indicates representativeness of the test sample for the total sample. Besides the training and test sample, a further data sample is employed solely for the development of artificial neural network models in order to avoid over-fitting as described in Section 2.3.3. The ANN-validation sample consists of approximately 20% of the total observations, is selected randomly, and reveals a similar distribution of data as the presented test sample.

3.4Representativeness

In order to conduct quantitative research on a statistical population, a sub-set of observations is selected from the population as a sample. The research can be carried out employing the selected data sample and statistical inferences can be drawn about the behaviour of the entire population as described by Fellows and Liu (2015). The determination of a sample simplifies the research to be conducted as for example by a reduction of the effort to collect and analyse data. The employed data sample should therefore provide an accurate representation of the statistical population. As introduced by Kahneman and Tversk (1972), the representativeness of a data sample is defined as the similarity of the sample and the population in essential characteristics under consideration of relevant conditions. For example, a data sample fulfils representativeness if the observations of the sub-set are selected randomly from the population. Consequently, a detailed description of the consistency of the data sample is crucial for the validity of the statistical inferences about the behaviour of the population. In the current investigation, a critical discussion about the representativeness of the underlying data sample is essential in order to draw conclusions about the practical applicability of the results and respective restrictions.

As described in the previous Sections, the cost data and further information on the observations of the investigation was obtained from multiple project partners. The selection of the facilities was usually conducted by the respective project partners and the consistency of the data sample is therefore based on a restricted level of randomness in terms of statistics. Nevertheless, it is assumed that the selected facilities are representative for the real estate portfolio of the respective project partners. The participating project partners are mainly public sector institutions as for example municipalities, universities, ecclesiastical administrations, social housing administrations, and social associations. As presented in Section 3.2.1, the verification of the cost data of individual facility types as for example municipal buildings revealed a certain conformity with facilities owned and operated by the private sector. Nevertheless, only restricted statistical inferences can be drawn from the underlying data sample about facilities operated and owned by the private sector.

Further limitations are expected in the analysis of management strategies as for example for the level of outsourced facility services or service level agreements. The analysis of management strategies is restricted to the outsourcing rate of cleaning services since other strategies and concepts are not existent for the observations provided by the participating project partners. Another restriction of the representativeness is expected in regard to the location. The data collection was primarily conducted in the south west area of Germany and a large amount of facilities included in the investigation are located in the city of Stuttgart. Therefore, the analysis of the variation of regional economic and climatic conditions is only available on a restricted level. In order to generalise the results of the current investigations for the application on facilities in other locations, the regional conditions can be taken into account by statistical data on the local economics of the construction sector as for example provided by the BKI (2016). Likewise, the variation of the climate conditions can be considered by statistical data on the local climate as for example presented in the standard VDI 3807-1:2013-06.

The applicability of the results of the current investigation is limited by the scope of costs and cost types under investigation. As described in Section 2.2.1 in detail, the operating costs analysed in the current study are determined according to the cost structure of the standard DIN 18960:2008-02. The application of the results of the statistical models and the categorised cost indicators is therefore restricted by the definition of the respective costs included in the various cost groups of the structure. In particular, the results of the aggregated first and second level cost groups require a detailed consideration of their scope when practically applied. A further limitation of the representativeness of the data sample is indicated by the scope of the collected cost data. As presented in the previous Sections, the cost data provided by the project partners contained the cash flows related to a facility for an accounting period of between 1 and 5 years. As a result of the variation of the general price level as for example by inflation, operating costs may vary significantly depending on the different years of their observation. In order to provide a consistent data base, the cost data included in the investigation are adjusted to first quarter 2016 prices based on the figures of DESTATIS (2017a,b). Correspondingly, the figures can be employed for an adjustment of the results of the investigation for a future application.

The maintenance costs vary significantly across the different stages of the life cycle of a facility as for example described by Bahr (2008). With relatively short accounting periods of between 1 and 5 years under consideration, it is indicated that the current investigation is restricted in the representativeness regarding the cost data for inspection and maintenance. Nevertheless, the investigation includes facilities constructed between the years 1370 and 2010 and represents therefore cost data of facilities in a variety of life cycle stages. Finally, restrictions are expected for the limited amount of observations for individual characteristics of variables as for example the type of facility. With a limited number of observations available, the presence of outliers or errors in the underlying data may distort the results of a statistical analysis substantially as described by Tabachnick and Fidell (2009). Therefore, the impact of a limited data sample is significantly reduced by a detailed pre-analysis based on descriptive statistics and an analytical and visual inspection of the data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset