Chapter 9
Computational Toxicology and Reach

Emilio Benfenati, Anna Lombardo and Alessandra Roncaglioni

IRCCS – Istituto di Ricerche Farmacologiche “Mario Negri”, Laboratory of Environmental Chemistry and Toxicology, Italy

9.1 A Theoretical and Historical Introduction to the Evolution Toward Predictive Models

Quantitative structure–activity relationship (QSAR) models are a versatile set of models that are used in many fields including chemistry, biology, and engineering. A QSAR model is designed on the basis of the endpoint that is envisaged. Consequently, it is quite obvious that numerous QSAR models can be designed. Furthermore, there are many ways to build a model for the same endpoint, using different chemicals to train the model, different chemical descriptors, or different algorithms. In this chapter, we discuss how the conceptual approach toward QSAR models may vary considerably.

Historically, QSAR models evolved on the basis of the purpose for which they were required; therefore, it may be useful to first introduce the differences among the older and more recent perspectives, in particular in case of the use of QSAR models for regulatory purposes.

Old QSAR models aimed to explore possible rules for a paticular outcone, to understand better the factors affecting the phenomenon under evaluation. It is only more recently that there has been interest in the possible use of QSAR models to predict the effect of a single molecule, avoiding the need to carry out an experimental test. This means that initially the interest was on the overall phenomenon, and not on a single substance to assess, trying to understand and explain the driving forces governing the phenomenon under investigation; the property data of the chemicals were known in this case, while what was unknown was the factor governing it. In the initial times of the QSAR modeling, a good publication was able to show that a certain descriptor could be associated with the observed property, such as, for example, aquatic toxicity. Initially, in most of the cases, some physicochemical descriptors were investigated, and the identification of the “best” one, useful to explain the phenomenon, was the challenge: if the model was working using certain descriptor(s), then the successful message of the study was about the role of these descriptors. There was almost no interest (i) in the validation of the model using other chemicals, (ii) in considering the model as substitute for the in vivo animal model, (iii) in the use of the model to predict the properties of a new chemical.

In the case of the use of QSAR to predict the effect of a substance, the descriptors of the model are known, and the value that is lacking is the property effect of the chemical.

While the initial studies were theoretical ones, later on the interest was on the practical use, to predict the properties of chemicals for regulatory purposes. This required a stricter process, compared to the initial studies, with the inclusion of model validation. The reasons for this are mainly related to the practical implication of the model and to the regulatory context. This generated the enforcement of procedures to verify the predictivity of the model not in terms of the descriptor's correctness, but of the property value of the individual substance.

The initial models were self-explanatory in a certain way: the aim was not to have a universal model, but the understanding of a phenomenon, which applied to the population of substances under examination. If it is the individual substance that is of interest, instead, we would also need a conceptual approach to verify whether the substance of interest is related or not to the population of the chemicals which are intrinsically the basis of a certain model. The need to apply a certain model to a certain substance poses the issue of defining the boundaries of the model. The so-called applicability domain of the model becomes an element of the model.

The theoretical explanation associated with a QSAR model still influences the acceptance of the results of the model. Indeed, it is appreciated if the model has an explicative power and is transparent in the indication of the factors affecting the property values. Quite often, this requires some simplification. In order to cope with more complex situations, more factors should be taken into account, which are probably not all known.

There is a duality in the function of QSAR models, which are asked to be predictive, but at the same time explicit and explanatory. This is not always possible. Actually, quite often, statistical models can provide higher predictive performance [1]. It is anyhow possible to analyze and understand the useful elements applied to the different perspectives. This will become more and more important in view of the increasing number of models for the same endpoint. Quite often, combined models provide better results than a single model [1, 2]. The fact that the best result comes from more than one model implies that a single, simple explanation is not sufficient, and does not cover all factors affecting the properties of a substance.

The increased complexity of the modern models is another major difference when they are compared with the older models, but this complexity is not a consequence of the recent use of the models for regulatory purposes. Indeed, the earlier conceptual differences we discussed addressed the use of the model, considering the output of the model: new knowledge on the mechanism, or new predicted values of chemical substances, for instance. Conversely, the increased complexity of the modern QSAR models is the result of a changed scenario of the possible inputs of the models. Indeed, more chemical descriptors are available today as input: thousands of them, and even more structural keys. Furthermore, more algorithms have been introduced and applied to QSAR models, boosting new generations of models. More and more data are available. This is not typically the case of in vivo data, but it becomes quite impressive in the case of in vitro data. Considering the case of initiatives such as Tox21 (https://www.epa.gov/chemical-research/toxicity-forecasting) and the introduction of high-throughput screening methods in general, it becomes clear that completely new perspectives will open up very soon.

This scientific progress introduces complexity of the methodology and more detailed results, and points toward challenges not only for science but also for the application of the QSAR models for regulatory purposes. Indeed, they demand further checking of the applicability of the legislative conditions.

All these factors complicate the evaluation of the models. We summarize in Table 9.1 the key points we discussed above. We discuss these factors in practice in the following sections, where we introduce practical examples of QSAR models with particular attention to the issues related to their application for regulatory purposes.

Table 9.1 The main differences between old models and models for regulatory purposes

Old models Models for regulatory purposes
Scientific interest Regulatory interest
Theoretical approach Practical approach
Interest in useful descriptors Interest in predicting property value of the target chemical
Interest is on the focused domain of the substances of the dataset Interest is in looking forward at a substance not in the training set
No interest in validation Validation required
Applicability domain implicitly defined by the composition of the dataset Applicability domain to be discussed

9.2 Reach and the Other Legislations

REACH (the European regulation for the Registration, Evaluation, Authorisation and Restriction of Chemicals) represented for Europe, but also out of Europe, a major change in the way to assess chemical substances. REACH does not only want to protect human health and the environment but also wants to promote innovation. Since its first article, innovation is clearly indicated as a key target. Innovation clearly means moving toward safer chemicals, removing the ones not complying with this requirement, and thus asking industry to invent a novel strategy for better substances.

Innovation also means having new methods to cope with the safety of substances, taking into account the opportunity to use alternative methods. Indeed, the first article of REACH and the other following articles mentioned these methods. The need for alternative methods derives from multiple considerations, including ethical issues and the need to have more efficient ways to assess chemical safety, taking into account the cost of certain tests, the time needed, and the available resources. In addition, there is also scientific need to cope with a lack of knowledge, because existing methods do not have solutions for all the problems. REACH clearly recognizes this, and foresees an update of the available methodologies, in order to use advanced procedures.

QSAR models are mentioned clearly and repeatedly within REACH. Annex XI specifically identifies a series of requirements, which are codified for the first time in case of the European legislation in such detail. For this reason, it is of interest to analyze what REACH says [3].

REACH classifies QSAR models as part of the so-called non-testing methods (NTM) that span read-across and grouping. All of these are addressed in Annex XI.

9.3 Annex XI of Reach for QSAR Models

Annex XI in the case of QSAR models reports the following:

“Results obtained from valid qualitative or quantitative structure–activity relationship models ((Q)SARs) may indicate the presence or absence of a certain dangerous property. Results of (Q)SARs may be used instead of testing when the following conditions are met:

  • results are derived from a (Q)SAR model whose scientific validity has been established,
  • the substance falls within the applicability domain of the (Q)SAR model,
  • results are adequate for the purpose of classification and labelling and/or risk assessment, and
  • adequate and reliable documentation of the applied method is provided.

The Agency in collaboration with the Commission, Member States and interested parties shall develop and provide guidance in assessing which (Q)SARs will meet these conditions and provide examples.”

REACH refers to the models as (Q)SAR models. We have simplified this discussion by using the acronym QSAR for both QSAR and SAR, thus including quantitative methods, as well as those which address categorical targets. In the following, we use the acronym QSAR “collectively,” without differentiating between QSAR and SAR.

It is important to note the relevance of this annex and all its implications. REACH requires valid models in the case of QSAR models. However, for the other ways to generate data, say for in vivo and in vitro methods, REACH demands validated (not valid) methods. This means that for the laboratory methods, the user should get values using official procedures, as described by the international bodies. The use of the data obtained with other experimental procedures is possible, but their role is of lower relevance, and should be supported by other considerations or used as part of data from multiple sources or nature. In the case of QSAR models, validation is not required.

9.3.1 The First Condition of Annex XI and QMRF

The first condition clearly explains the meaning of a valid requirement. Scientific validity is the basis of acceptance. Innovation is a key topic for REACH, as we have mentioned. In this strategy, there is a parallel consideration related to the US approach moving toward Tox21. When in vitro and in silico models are used in the United States, their validity is also assessed on the basis of scientific considerations, that is, referring to the state of the art based on the literature and methods described there. These novel in vitro and in silico methods are rapidly evolving, with new methods appearing continuously in the literature. It would be impossible to apply the process of the formal validation to all of them, as applied for the other tests. In the United States, it was decided to extend this flexibility to in vitro and in silico methods. REACH, however, has adopted this strategy for QSAR models only.

There is another fundamental difference of QSAR methods, compared to the other methods, such as in vivo and in vitro ones. In principle, these methods can be applied to all substances. Conversely, as we mentioned above, QSAR models are not necessarily universal, and, in practice, their results, and thus their validity, depend on the chemical substance. REACH is the legislation devoted to chemical substances. Its target is to describe criteria for the safe use of the chemicals, and not necessarily to solve other problems, such as which model can be applied for which chemical. Indeed, each model has to be evaluated for the specific chemical. Thus, European Chemicals Agency (ECHA) does not provide a list of valid models, that is, those that are to be considered always valid.

The first condition better defines the meaning of validity, referring to the scientific context, as we discussed above. In many cases, the organization for economic co-operation and development (OECD) QSAR principles have been mentioned to describe a series of points to be checked [4]. The points are the following:

  1. 1. a defined endpoint
  2. 2. an unambiguous algorithm
  3. 3. a defined domain of applicability
  4. 4. appropriate measures of goodness-of–fit, robustness, and predictivity
  5. 5. a mechanistic interpretation, if possible.

A debate is ongoing about these points. In addition, OECD representatives discussed about the opportunity to refresh them, updating them on the basis of evolution of the QSAR models.

These five OECD criteria are the basis of the QSAR model reporting format (QMRF) and QSAR prediction reporting format (QPRF) [5].

The QMRF is a procedure to gather the information related to different models in a standardized way, so that it is easier to assess and compare them. The QMRF does not imply that a model is necessarily a good one.

Some critiques have pointed out that the QMRF may be demanding for users, if they have to provide it for the model they want to use, and thus may represent a barrier for a more general use of QSAR models, soliciting a simplified version of the QMRF.

Some studies compared different QSAR models, with particular attention to their application for REACH. For instance, EC-funded projects, such as ANTARES (http://www.antares-life.eu/), CALEIDOS (http://www.caleidos-life.eu/), and PROSIL (http://www.life-prosil.eu/), evaluated a number of models and endpoints. Whether the models had or did not have a QMRF was not related to the performance of the model.

In fact, the QMRF is based on a formal check of the existence of certain pieces of information, and not to the quality of the results. However, these pieces of information may be very useful to define a kind of identity card of a model. Thus, this can be useful, also in light of the fourth conditions of Annex XI, demanding proper documentation. Now a certain number of models have been used for REACH. As we will see in the following, ECHA also used some models to provide examples of how to use QSAR models [6] and to identify substance candidates that meet Annex III criteria [7]; thus, there are models, which may be considered more common, so less documentation may be necessary to explain them.

The QPRF is related to the QMRF. This document refers to the points addressed in the QMRF and applies them to the specific substance under evaluation. Thus, the QMRF refers to the QSAR model, while the QPRF focuses on the specific substance. The correct application of a model to a specific substance is closely related to the evaluation of the applicability domain of the model.

9.3.2 The Second Condition and the Applicability Domain

As we said, REACH addresses the assessment of substances. Within this need, it is important to evaluate whether the use of a certain model is appropriate for the target chemical. As we have stated, the QSAR models are not universally applicable and thus the user should explain whether the applied model could be used. The conceptual basis for this evaluation is that a certain model is based on a set of compounds used to build up the model. This set of compounds is called training set. The target chemical to be evaluated with the QSAR model may be more or less related to the chemicals used to build up the model. For instance, if the model is based on aromatic substances, it may be questionable to apply this model to an aliphatic substance. This does not mean that the prediction will be wrong. However, it will not be possible to check the results of the models on previous similar substances, and thus the reliability of the prediction may be questionable.

The applicability domain should not be evaluated only on the basis of chemical similarity. No perfect model exists, and thus there are outliers (i.e., substances which are wrongly predicted) within each model. If we imagine a model for mutagenicity based on 100 substances in the training set and a second model for bioconcentration factor (BCF), also built up using the same 100 substances, we will observe different outliers for the two models. For instance, this may happen because the model is missing the structural alert for the target substance, which is not identified as mutagenic. However, the prediction on BCF is commonly based on physicochemical descriptors, and thus it is possible that the BCF model correctly predicts the same chemical, that is an outlier for the mutagenicity model.

Thus, to evaluate properly the applicability domain, tools used should not be based only on chemical similarity. We address this in the following, using the example of the virtual models for property evaluation of chemicals within a global architecture (VEGA) algorithm for the applicability domain.

We have stated that a QSAR is typically based on a training set. This is not always the case. Some models are actually collections of rules derived by human experts. An example of this is a model based on structural alerts for mutagenicity. In this case, human experts typically looked at the different studies reporting the mutagenic effect of different substances, and then extracted a series of rules, codifying the structural alerts associated with the adverse effect. The assumption is that if there is the structural alert in the molecule, the substance is mutagenic. However, it may be questionable whether the lack of structural alerts implies lack of mutagenicity, because the list of structural alerts for mutagenicity is not complete. Thus, in this case, the use of the applicability domain may be also difficult.

9.3.3 The Third Condition of Annex XI, and the Use of the QSAR Models

The third condition addresses the requirement or appropriateness of a certain model. A model may be a perfect one, but not useful for REACH because it does not address a specific endpoint of the REACH legislation. For instance, it may be a model to predict the color of a chemical. While the case in this example is quite obvious, there may be many cases that require a closer consideration. The third condition mentions the possible use for classification and labeling or risk assessment. Thus, this condition asks to relate the output of the model to the needs identified by REACH. We note that there may be different needs in case of risk assessment and in case of classification and labeling. Indeed, the toxicity value in case of risk assessment has to be continuous, so that it can be used to calculate the predicted no effect concentration (PNEC). Conversely, in other cases, the model can provide as output a categorical value. For instance, to classify a substance as CMR (carcinogenic, mutagenic, or reprotoxic), a binary category is sufficient. We also notice that different endpoints are required within different regulatory frameworks. For instance, mutagenicity may refer to different endpoints, such as bacterial reverse mutation assay, which is an endpoint within REACH, while for the classification labeling and packaging (CLP) regulation [8], the classification refers to inheritable mutation of relevance for humans.

Furthermore, even if the endpoint is the same, there are cases where different thresholds apply, as in the case of BCF, as different thresholds exist within different legislations. Indeed, in the REACH legislation, a compound with a BCF of 1500 is not considered bioaccumulative (the threshold is 2000), whereas in the CLP regulation, this is considered potentially bioaccumulative (with a threshold of 500). Even for persistant, bioaccumulative, and toxic (PBT) purposes, in the American legislation Toxic Substances Control Act [9], the threshold is 1000; therefore this compound is considered bioaccumulative, but not according to REACH. This may lead to classification of the compound as a PBT in America but not in Europe.

In conclusion, a model may be a good model, it may be related to an endpoint relevant for REACH, but if it is based on thresholds which conflict with those indicated by REACH, it may be not applicable for REACH.

We have discussed the format of the output, which should be adequate for the purpose. Thus, the definition of whether the output of the model is bioaccumulative or not may be inadequate for REACH. Furthermore, it is preferable that the input of the model also corresponds to the classical format of the experimental model. Indeed, most models use data obtained with experiments conducted according to the official guidelines as values of the training set. These conditions are obviously those that do not pose questions about their use. However, there may be situations that deviate from this criterion, and may produce values useful from the point of view of the output of the model, with the values containing suitable information for the desired target. We have already discussed that REACH has clearly mentioned that for some endpoints, the experimental tests are not ideal. In the future, there will be more and more data obtained using in vitro models with human cells. The experiments on humans are not ethical, but a different situation may arise from the use of cells. Modeling these data may provide very useful results particularly for human toxicity, but obviously deviate from the traditional data using laboratory animals. EC-funded projects, such as EU-ToxRisk, are proceeding in this direction, as also the Tox21 initiative in the United States. As a result, the adequacy of innovative models adopting this innovative perspective within a regulatory framework may be a point of debate.

A related, but easier situation may be the case of models which slightly modify the initial data derived from the experimental assay. An example is the case of the bacterial reverse-mutation assays. It is acknowledged that using a single bacterial strain may not cover all possible instances provoking mutation. Similarly, metabolic activation closer to the human situation is also adopted, using the S9 fraction derived from animals. Thus, in practice, a complex experimental methodology is used, with the aim to mimic as closely as possible a series of effects of interest. Under these circumstances, one approach is to simulate closely the individual experimental assays. Another one is to relate to the overall mutagenicity assessment, as obtained out of the complete battery of the assays (10 assays are required according to the OECD guideline 471 [10]). The comparison between the results obtained with the two approaches indicates that the second one is more predictive [11]. Indeed, the overall set of numerous data about this kind of assays provides a robust and large basis. Conversely, if we split the training sets into 10 separate subsets, one for each of the 10 assays, we dramatically reduce the number of available items in each dataset, and for some of the subsets the number of items is very low. This affects the overall statistical relevance of the models. Conversely, using a single dataset with data on all assays, even if for some chemicals the complete set of values derived from the individual assay is missing, the values from related compounds compensate for the information to be extracted from the missing values. In other words, the overall picture derived from the thousands of values from these assays provides a strong basis for predictive models. Conversely, if we split the values into the 10 separate assays, we reduce the basis for each of them, and the overall combination of these 10 weaker models is not so predictive, because part of the data is lost.

9.3.4 Adequate and Reliable Documentation of the Applied Method

As described previously, there is a very high number of in silico models, and the number is continuously increasing. It is a common situation to make use of the most recent models. Thus, it is clear that in this very rapidly evolving scenario it is very difficult to relate to only some well-known, consolidated models. REACH requires to provide adequate information because, as we said, there is no finite list of in silico approved models, but this list is continuously increasing. This requirement seems necessary from the point of view of the authorities which has to be informed about the model. However, this requirement is also very useful for the applicant, as a check, to verify if all the elements to evaluate a model are present and have been analyzed.

The information is particularly necessary for models which are new, while it may be easier to refer to models that have been used by ECHA in its report on QSAR [6]. Indeed, ECHA has mentioned some models in its report, for example, EPISuite (https://www.epa.gov/tsca-screening-tools/epi-suitetm-estimation-program-interface), VEGA (http://www.vega-qsar.eu/), T.E.S.T. (https://www.epa.gov/chemical-research/toxicity-estimation-software-tool-test). We emphasize that other models also can be used, and the list of models which can be used is surely not restricted to these models. Furthermore, using these models does not mean that the results for the specific chemical under evaluation will be correct.

There has been a debate about the level of documentation particularly in the case of commercial models. Indeed, commercial models in most cases are not explicit in the detailed information on the basis of the model, such as chemicals used to build up the model and the algorithm used. This information is typically available in the case of public models such as VEGA, EPISuite, or TEST. Authorities so far have not been restrictive in the use of one model or another. Thus, it does not seem that there are a priori barriers on the use of commercial models related to the much more limited amount of information available. However, there may be some practical cases to be carefully evaluated in the case of the use of commercial models. For instance, when the user needs to discuss the similarity of the target compound with the chemicals in the training set and document the line of reasoning done, if the availability of the chemicals in the training set is precluded, this discussion may be impossible.

In general, the pieces of information relevant for providing adequate documentation for supporting the prediction obtained are those requested also in the QMRF/QMPF.

9.4 The ECHA Guidelines and the Use of QSAR Models within ECHA

In the following, we provide some examples to better clarify what we discussed previously. For these examples, we use VEGA, mainly because it has been used by ECHA for the examples it reported. Another reason is that it is one of the systems which provides more detailed information about the elements to be used to evaluate the result. We show one example related to continuous value and one about categorical value.

9.4.1 Example of Bioconcentration Factor (BCF)

In VEGA v1.1.3, three models for the estimation of BCF are available. They are built starting from different training sets and using different methods. In this example, we focus on the first model, the CAESAR model. The substance used as target is the 1,4-dimethylnaphthalene (CAS No. 571-58-4).

The CAESAR model was published [12], evaluated [13] and compared with other models [14]. All these articles are published in peer-reviewed journals. This is in line with the first condition of Annex XI of REACH. This model is based on eight descriptors and two regression models combined together. The training set is of 378 compounds and the test set of 95 compounds. The experimental values assigned to these compounds were obtained according to the OECD Guideline 305 [15], as suggested in the ECHA Guidance R. 7c and R. 11 [16, 17]. The data used to build the model, together with the continuous output, make the results adequate for the REACH and CLP purposes: indeed, they supply a BCF value for the registration, for the classification and labeling, and for the PBT assessment.

For the target compound, the predicted value is of 2.91 log (L/kg) and the result is considered reliable by the model (see Figure 9.1). In addition, the two single models, which are integrated within the CAESAR model, predict 2.74 and 2.9 log (L/kg), respectively (see Figure 9.1, sub-models 1 and 2). Since this model was built considering the REACH requirements, it generates a B/vB evaluation adding a confidence interval to the predicted value (when reliable) (see Figure 9.2). This may be useful in case of prediction close to the thresholds. The confidence interval is based on the threshold considered and on the reliability of the prediction. Therefore, it is compound specific. In this case, the confidence interval is the same for both the thresholds: 0.5. This means that the predicted value of 2.91 may become 3.41. This poses an alert on the B threshold (of 3.3 log units), but not on the vB threshold (of 3.7 log units). This increases the adequacy for the PBT assessment of the model.

Screenshot of first page of the output of the CAESAR model for the target compound.

Figure 9.1 The first page of the output of the CAESAR model for the target compound.

Screenshot of page with the confidence interval.

Figure 9.2 The page with the confidence interval.

The reliability of the estimation is based on different parameters summarized in the applicability domain index (ADI; see Figures 9.3 and 9.4). The majority of the parameters are based on similar compounds found in the training and test sets. The output reports the six most similar substances with their CAS No., structure, similarity index, and both experimental and predicted values. In this case, all the six similar chemicals have a similarity above 0.9 (where 1 is the identity and 0 indicates the complete diversity). In particular, the first two have a naphthalene with two methyl groups (as the target compound) on the same ring. The difference is the position of the methyl groups: ortho and meta in the similar compounds and on different rings in the target one. The ADI uses the similarity index of the two most similar compounds as first parameter. In this case, the ADI value is high (0.986). The ADI values are in the range between 0 and 1. Values above 0.85 are quite good, while values below 0.75 may be relate to substances less useful for the target compound.

Illustration for list of the sixth most similar compounds.

Figure 9.3 The list of the sixth most similar compounds.

Screenshot of applicability domain index and its components page.

Figure 9.4 The applicability domain index and its components.

The second parameter of the ADI considers the accuracy of the prediction for the two most similar compounds. It is based on the error in prediction. Therefore, the lower this parameter is, the better the model behaves. In this case, the value is 0.102, which is a good value. Then the ADI considers the concordance between the predictions for the similar compounds and the target one. Also in this case, the lower, the better. The concordance index for the target compound is 0.106. Another parameter evaluates the maximum error in prediction among the similar compounds. It spreads from 0 (low error) to 1 (high error). In this case, it is low (0.203). These parameters indicate that there are similar molecules, well predicted and with BCF values in agreement with the prediction done for the target compound. Then, the ADI considers the descriptors used. It verifies whether the descriptors calculated for the target compound are inside the range of the descriptor values for the training set or not. In the positive case, as this one, the target compound is sufficiently similar to the training set compounds; therefore, the prediction has a higher reliability. The last parameter evaluated in the ADI uses atom-centered fragments. It evaluates whether, in the target molecule, there are fragments not represented in the training set or are rare. This parameter spreads from 1 (no rare or unknown fragments found) to 0 (a high number of rare or unknown fragments found).

In the output, there are also charts, as in Figure 9.5. They plot the calculated Mlog P (the log Kow used to estimate the BCF) and the experimental BCF of the chemicals used to build up the model. The Mlog P is the most important descriptor used for the calculation of BCF in the CAESAR model. The first chart shows the entire training set together with the target molecule. If it is in the cloud of the values, the prediction has a high reliability, as in this case. The second plot shows only the three most similar compounds (both the predicted and the experimental BCF values) and the target compound. In this case, the target compound is in the range of the BCF values of the similar compounds. Note that in this figure, two compounds overlap because they have the same Mlog P, which is also the same as that of the target compound. The third similar compound has a higher Mlog P and a higher logBCF. Moreover, its logBCF is about 3, whereas the target compound has a lower predicted value.

Illustration for chart with the Mlog P and logBCF plots.

Figure 9.5 The chart with the Mlog P and logBCF plots.

In conclusion, the target compound is well predicted and inside the applicability domain. Its predicted value, 2.92, is close to the B threshold, in particular considering that the experimental variability for BCF spreads from 0.42 [13] to 0.75 [18]. This confirms the value obtained with the confidence interval and means that this compound may be bioaccumulative but not very bioaccumulative. Moreover, this model is scientifically valid and well documented.

9.4.2 Example of Mutagenicity (Reverse-Mutation Assay) Prediction

Historically, the bacterial reverse mutation assay (e.g., Ames test) has been widely used as a first test in the evaluation of genotoxicity. In the REACH context, the reverse-mutation assay is recommended for all substances produced/imported above 1 ton/year. Other more sophisticated in vitro or in vivo tests are required for higher tonnages or if a positive response is expected or observed in the in vitro bacterial test.

VEGA has implemented several models addressing this type of test: four individual models based on various algorithms (statistical and knowledge-based ones) and a consensus model combining and weighting the results of the four individual models.

Information about the models is available in the model's guide (accessible through the question mark icon aside each model, Figure 9.6) which synthetically describes the model's characteristics and statistical performance. Further details about some of the specific models are available in the literature. These pieces of information are useful to cover the needs for an adequate documentation and help in assessing the scientific validity of the model.

Screenshot of panel to access information about the models page.

Figure 9.6 The panel to access information about the models.

About the adequacy for the purpose of satisfying REACH requirements, we have to highlight that most of the available data as the basis of the QSAR models for the Ames test are related to an overall call, which consolidates the results obtained on the different strains and in the presence or absence of metabolic activation. As explained previously, this approach allows including a larger amount of data and improves the coverage of the models. At the same time, it does not allow ensuring that all the appropriate strains required by the most updated guidelines (a specific combination of five strains tested in presence and absence of an exogenous source of metabolic activation) have been tested. This problem theoretically is more related to the flag assigned to negative compounds (where all strains are required to be negative) rather than those resulting positive (as the response observed in a single strain is enough to consider the compound positive). In practice, the use of the existing models to predict substances registered for REACH did not show any bias toward false negatives [2]. On the contrary, false positives are found in case of non-testing methods [19].

These considerations are related to the possibility of using QSARs to compile the dossier containing the required toxicological data. If we consider, on the other hand, the possibility to use QSAR results for the purpose of CLP and risk assessment, we have to highlight that in both cases the relevance of the bacterial reverse test is quite marginal compared to the other higher tier tests much more informative for the mutagenicity assessment. In these situations, they remain more applicable in a weight-of-evidence approach, while the use of a QSAR estimation for the bacterial test may be useful for screening and prioritization purposes.

To address the reliability of the prediction and to define whether the compound falls in the model's applicability domain, at first glance, the analysis of the most similar compounds can be useful as in the case of BCF. As an example, we can consider ethyl 2-bromobutanoate (CAS No. 533-68-6) and related estimations given by the four models plus the consensus presented in Figure 9.7.

Screenshot of the page showing Estimations provided by the four individual models plus the consensus for a chemical used as example (ethyl 2-bromobutanoate).

Figure 9.7 Estimations provided by the four individual models plus the consensus for a chemical used as example (ethyl 2-bromobutanoate).

Most of the models point toward a toxic assessment for this chemical but the kNN model, even though most of the estimations are not considered to be completely reliable. By analyzing the most similar compounds in this model (see Figure 9.8), it appears that only the first and the sixth compounds contain a halogen (present also in the target), while the others do not contain this heteroatom. These two compounds are also the only experimentally mutagenic examples among the similar compounds. However, the kNN model uses the first four examples to derive its assessment. Therefore, only one compound among them is positive and hence the target compound is predicted as negative.

Illustration for six most similar compounds to the target present in the original dataset and their respective observed and predicted values.

Figure 9.8 The six most similar compounds to the target present in the original dataset and their respective observed and predicted values.

In the documentation available for reasoning about some individual models, some relevant structural moieties are presented. These are fragments or structural alerts correlated either statistically (for the SARpy model) or by experts (for the ISS model), to the observed activity. The alert SA8 is shown in Figure 9.9, together with three of the most similar compounds. This alert provides a theoretical explanation about its presence and its associated mechanism. Indeed, this is an alert of the Benigni-Bossa rules, which have been described and characterized [20]. A closer look at the particular property effect on quite similar compounds may be obtained by looking at Figure 9.8, where there are the two brominated compounds which are experimentally mutagenic. This evidence-based analysis is in line with the read-across evaluation. Moreover, it should be combined with the mechanistic basis provided by the structural alert. A further alert (SM93) is provided by VEGA, as reported in Figure 9.10, which is indeed more specific for the brominated aliphatic compounds, offering further support to the overall assessment for mutagenicity. These evidence overrule the negative estimation provided by the kNN model.

Screenshot of page showing SA8 and the three most similar compounds of the training set with the same SA

Figure 9.9 The SA8 and the three most similar compounds of the training set with the same SA.

Illustration for SA SM93 and the three most similar compounds of the training set with the same SA.

Figure 9.10 The SA SM93 and the three most similar compounds of the training set with the same SA.

9.5 Conclusions

We analyzed the multiple points to be evaluated in order to use in silico models within REACH. The in silico models are tools which are very useful to assess the properties of chemical substances, but the user should carefully evaluate whether the existing models may be useful. A series of conditions related to the model itself have to be satisfied. However, the final choice should be related to the use of the model for the substance of interest. Indeed, the same model may be good for one substance, but not for another. We described how modern models do not simply provide the predicted value, but offer many other elements, which have to be carefully evaluated, documented, and reported. If more than one model is chosen for the same property, the process has to be repeated on all of them.

The modern QSAR models are very valuable in assisting the human expert in making an overall assessment. The expert should take the final decision, but the models may provide a reproducible way to scrutinize a series of elements very important for the final judgement. Indeed, modern QSAR models are a source of valuable pieces of information besides the predicted value, such as values of related compounds and identification of structural alerts, which are actually related to read-across. As a matter of fact, modern QSAR models combine QSAR predictions with an evaluation very close to read-across. In this, they help the user making a kind of weight of evidence assessment, in particular, in the recommended case of using multiple models.

References

  1. 1 Amaury, N., Benfenati, E., Boriani, E. et al. (2007) Results of DEMETRA models, in Quantitative Structure–Activity Relationships (QSAR) for Pesticide Regulatory Purposes (ed. E. Benfenati), Elsevier Science Ltd, Amsterdam, pp. 201–281.
  2. 2 Cassano, A., Raitano, G., Mombelli, E. et al. (2014) Evaluation of QSAR models for the prediction of ames genotoxicity: A retrospective exercise on the chemical substances registered under the EU REACH regulation. J. Environ. Sci. Health C Environ. Carcinog. Ecotox. Rev., 32, 273–298.
  3. 3 Registration, Evaluation, Authorisation and restriction of Chemicals (REACH) Regulation (EC) No 1907/2006 of the European Parliament and of the Council of 18 December, 2006.
  4. 4 OECD (2004) OECD principles for the validation, for regulatory purpose, of (Q)SAR models, https://www.oecd.org/chemicalsafety/risk-assessment/37849783.pdf (accessed August 16, 2017).
  5. 5 European Chemicals Agency (2008) Guidance on information requirements and chemical safety assessment Chapter R.6: QSARs and grouping of chemicals. Guidance for the implementation of REACH.
  6. 6 European Chemicals Agency (2016) Practical guide How to use and report (Q)SARs Version 3.1 – July 2016. DOI: 10.2823/81818.
  7. 7 European Chemicals Agency (2016) Preparation of an inventory of substances suspected to meet REACH Annex III criteria Technical documentation.
  8. 8 Regulation (EC) No 1272/2008 of the European Parliament and of the Council of 16 December 2008 on classification, labelling and packaging of substances and mixtures, amending and repealing Directives 67/548/EEC and 1999/45/EC, and amending Regulation (EC) No 1907/2006, 2008.
  9. 9 Toxic substance control act, US Senate, as amended through P.L. 107–377, December 31, 2002.
  10. 10 OECD (1997) OECD Guideline Test No. 471: Bacterial Reverse Mutation Test.
  11. 11 Golbamaki Bakhtyari, N., Raitano, G., Benfenati, E. et al. (2013) Comparison of in silico models for prediction of mutagenicity. J. Environ. Sci. Health C Environ. Carcinog. Ecotoxicol. Rev., 31, 45–66.
  12. 12 Zhao, C., Boriani, E., Chana, A. et al. (2008) A new hybrid system of QSAR models for predicting bioconcentration factors (BCF). Chemosphere, 73, 1701–1707.
  13. 13 Lombardo, A., Roncaglioni, A., Boriani, E. et al. (2010) Assessment and validation of the CAESAR predictive model for bioconcentration factor (BCF) in fish. Chem. Cent. J., 4 (Suppl 1), 1–11.
  14. 14 Gissi, A., Lombardo, A., Roncaglioni, A. et al. (2015) Evaluation and comparison of benchmark QSAR models to predict a relevant REACH endpoint: the bioconcentration factor (BCF). Environ. Res., 137, 398–409.
  15. 15 OECD (2012) OECD Guideline Test No. 305: Bioaccumulation in Fish: Aqueous and Dietary Exposure.
  16. 16 European Chemicals Agency (2014) Guidance on Information Requirements and Chemical Safety Assessment Chapter R.7c: Endpoint specific guidance. Guidance for the implementation of REACH.
  17. 17 Guidance on Information Requirements and Chemical Safety Assessment Chapter R.11: PBT/vPvB assessment. Guidance for the implementation of REACH. European Chemicals Agency, 2014.
  18. 18 Dimitrov, S., Dimitrova, N., Parkerton, T. et al. (2005) Base-line model for identifying the bioaccumulation potential of chemicals. SAR QSAR Environ. Res., 16, 531–554.
  19. 19 Benfenati, E., Belli, M., Borges, T. et al. (2016) Results of a round-robin exercise on read-across. SAR QSAR Environ. Res., 27, 371–384.
  20. 20 Benigni R., Bossa C., Jeliazkova N.G., Netzeva T.I., and Worth A.P. (2008) The Benigni/Bossa rulebase for mutagenicity and carcinogenicity – a module of toxtree. Technical Report EUR 23241 EN, European Commission – Joint Research Centre.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset