An Application of InterCriteria Analysis Approach to Assess the AMMOS Software Platform Outcomes

The experimental procedures of drug design, proven to be time-consuming and costly, are successfully complemented with computer-aided (in silico) approaches nowadays. Virtual ligand screening (VLS) is one of the most promising approaches when searching for new hit compounds. The efficiency of VLS procedures might be improved via post-docking optimization. In the focus of this investigation is AMMOS (Automatic Molecular Mechanics Optimization for in silico Screening) developed as multi-step structure-based procedure for efficient computational refinement of protein-ligand complexes at different levels of protein flexibility. Their performance has been assessed by the recently developed InterCriteria Analysis (ICrA), elaborated as multi-criterion decision-making approach to reveal possible relations in the behavior of pairs of criteria when multiple objects are considered. The capacity of ICrA as a supporting tool to assess the effect of applying different levels of protein flexibility in the post-docking optimization via AMMOS has been investigated and analyzed. Keywords-intercriteria analysis; post-docking optimization;


I. INTRODUCTION
Nowadays, a variety of computer-aided modelling techniques, also known as in silico, are developed and intensively applied in drug design, as the first filtering step before the time consuming and expensive conventional experimental testing. The structure-based virtual screening (SB-VS) is one of the most promising in silico approaches in the discovery of hit compounds by docking them in proteins' binding sites. The efficiency of SB-VS can be further improved via post-docking optimization of the protein-ligand complexes. For the needs of post-docking procedures in computeraided drug design projects, the software platform AMMOS (Automatic Molecular Mechanics Optimization for in silico Screening) has been developed [18]. AMMOS has been elaborated for energy minimization of pre-docked protein-ligand complexes at different levels of protein flexibility.
In this investigation, the recently developed InterCriteria Analysis (ICrA) approach [1] is applied to assess the AMMOS software platform outcomes. ICrA, based on the fundamental mathematical formalisms of index matrices and intuitionistic fuzzy sets, has been elaborated to reveal possible relations in the behavior of pairs of criteria when multiple objects are considered. ICrA has gained an increasing scientific interest and has been proven as a reliable decisionmaking technique for solving numerous real world problems in different areas, such as economy and ecology [6], [7], medicine and biomedicine [8], [9], artificial intelligence and metaheuristics [10], [11], [12], etc. This suggests that ICrA might support the decision-making also in in silico studies of complex biomolecular systems. Recently, the effectiveness of ICrA has been tested for the first time to assess the performance of various scoring functions available in software packages used for molecular docking [13], [14]. In this study, ICrA is employed to assess the outcomes of different levels of protein flexibility available in AMMOS platform, aiming to reveal potentially new intercriteria relations between results obtained at the stage of post-docking optimization of selected protein-ligand complexes thus helping in selection of appropriate levels of computational complexity in the virtual screening of bioactive compounds.

A. Target proteins
Two target proteins have been selected for the purposes of this study, namely estrogen receptor alpha (ERα) and neuraminidase (NA). They have been chosen not only for the reasons of their biomedical significance, but also for the dissimilar physicochemical properties and topology of their binding sites. ERα has a non-polar and closed binding pocket, while NA has a polar and large, widely open binding pocket.
ERα is an important transcription factor, widely expressed in the human body in tissues of the breast, the male and female reproductive systems, in the brain, bone, heart, liver, adipose tissue, colon, skin, salivary glands, etc. [23]. Several diseases are associated with dysregulation of ERα in the organism, the most important being cancer (of the breast, ovaries, colon, prostate), cardiovascular diseases, metabolic disorders, neurodegeneration, inflammations, osteoporosis. Selective ERα modulators are applied to prevent or treat ERα-positive breast cancer, among them tamoxifen, raloxifene, fulvestrant, etc. [22].
NA is a major surface tetrameric glycoprotein in influenza A and B viruses [25], with the role to facilitate the release of the virions from the infected cell. Among its inhibitors, approved globally or partially, are oseltamivir, zanamivir, peramivir, and laninamivir [24], which act by blocking NA activity to prevent the spread of the virions outside of the infected cell.
The crystal structures of the two proteins have been downloaded from Protein Data Bank (PDB, https://www.rcsb.org). The selection from the complexes available in PDB has been made based on their resolution and protein structure completeness: PDB ID 3ert, resolution 1.90Å has been selected for ERα, and PDB ID 1b9s, resolution 2.50Å for NA.

B. Dataset preparation
Two focused libraries of small molecules, for each of the target proteins, have been prepared following the steps below: i) ChemBridge diversity set (CDS, https://chembridge.com/) has been used as initial collection of chemical compounds; ii) CDS has been subjected to ADME/Tox filtering by Filter 1.0.2 program (OpenEye Scientific Software, https://www.eyesopen.com/) resulting in 37 970 drug-like molecules at the output, further referred as decoys; iii) ten known active ligands (with micromolar to nanomolar binding affinities) have been added for each target protein to the decoys collection, and a single 3D conformer for each compound has been generated in OMEGA 2.0 program (OpenEye Scientific Software); iv) up to 50 conformers per structure have been generated using Multiconf-DOCK ( [16], https://dock.compbio.ucsf.edu/Contributed Code/ multiconfdock.htm); v) the shape complementarity filtering protocol MS-DOCK [16] using rigid body docking with DOCK 6.0 ( [17], https://dock.compbio.ucsf.edu/DOCK 6/) has been applied to reduce the initial decoys library, considering the binding site properties of the target proteins. The whole procedure of dataset preparation and docking-scoring protocol is described in detail in [18].

C. Post-docking software platform AMMOS
AMMOS integrates automated procedures for efficient computational refinement of proteinligand complexes. Originally developed as a free downloadable standalone software, later on AMMOS has been upgraded to the interactive web server AMMOS2 [19] (Fig. 1, http://drugmod.rpbs.univ-paris-diderot.fr/ammos Home.php). Additionally, AMMOS2 allows for inclusion of explicit water molecules and individual metal ions in the protein-ligand complexes during minimization. AMMOS2 provides a comprehensive analysis of computed energies and interactive visualization of refined protein-ligand complexes allowing the users to perform additional analysis for drug discovery or chemical biology projects throughout ligands ranking by the minimized binding energies. The computational refinement of the protein-ligand complexes operates with five different levels of the protein flexibility (the ligand atoms are always flexible), namely: fully flexible protein (Case 1), flexible side chains only (Case 2), flexible protein within a sphere with user-defined radius (Case 3), flexible side chains only within a sphere with user-defined radius (Case 4), and rigid protein (Case 5).
The required input files for the energy minimization in AMMOS platform are the target protein (receptor) structure in pdb format, and the database of drug-like chemical compounds (ligands) to be virtually docked into the receptor's binding site in mol2 format. As an output, the ligands' minimized binding energies (sorted in ascending order) are obtained and can be subjected to further analysis for drug discovery purposes.

D. ICrA background
Developed as multicriteria decision making approach, ICrA operates over the arrays of data, obtained by the measurement of multiple objects against multiple criteria and allows for consideration of uncertainty in information processing. ICrA is based on the algebraic apparatus of the index matrices (IM) [4] for processing of data arrays of diverse dimensions, and the intuitionistic fuzzy sets (IFS) [2], [3] as a mathematical tool for handling uncertainty.
Let us have an IM .
where for every p, q, (1 ≤ p ≤ m, 1 ≤ q ≤ n): O p is an object being evaluated; C q is a criterion, applied to considered objects; e Op,Cq is a real number (evaluation), which is comparable to relation R with all the rest elements of the IM. Let R be the dual relation of R in the sense that if R is satisfied, then R is not satisfied and vice versa. All mathematical justifications of ICrA might be found in details in [1]. ICrA begins with an IM of dimensions of m objects and n criteria and ends with an IM of dimensions n × n, formed after a pairwise comparison between every two different criteria along all evaluated objects.
Let the intuitionistic fuzzy counter S µ k,l be the number of cases, in which the relations R(e Oi,Ck , e Oj,Ck ) and R(e Oi,Cl , e Oj,Cl ) (or the relations R(e Oi,Ck , e Oj,Ck ) and R(e Oi,Cl , e Oj,Cl )) are simultaneously satisfied, and the intuitionistic R(e Oi,Cl , e Oj,Cl ) (or the relations R(e Oi,Ck , e Oj,Ck ) and R(e Oi,Cl , e Oj,Cl )) are simultaneously satisfied. It is obvious that: For every k, l (1 ≤ k ≤ l ≤ m), and for m ≥ 2, it can be defined both -the degrees of agreement µ Ck,Cl and disagreement ν Ck,Cl as follows: The collected intuitionistic fuzzy pairs (IFP) form the resulting IM, that determines the degrees of correspondence between criteria C 1 , . . . , C n : Following the concept of consonance and dissonance [5], if α, β ∈ [0, 1] be given, such that α + β ≤ 1, the criteria C k and C l are in: • positive consonance, if µ Ck,Cl > α and ν Ck,Cl < β; • negative consonance, if µ Ck,Cl < β and ν Ck,Cl > α; • dissonance, otherwise. For clarity, Fig. 2

III. ICRADATA SOFTWARE PACKAGE
ICrAData [15] is a free access software developed for the needs of ICrA approach implementation. The last downloadable version 2.5 of the software is available at https://intercriteria.net/wpcontent/uploads/2021/08/ (Last access November 8th, 2021).
ICrAData employs Java (versions up to 1.8) or C++ (versions of the branch 2) to compute and visualize the results from ICrA for particular input data. The ICrAData screen layout includes a section for the input data in table format (on the left), a central section for visualization of the obtained matrices standing for the calculated degrees of agreement and disagreement, and a rightmost panel for graphical representation of the results via the intuitionistic fuzzy triangle. The thresholds for α and β could be modified by the user, their default values being 0.75 for α, and 0.25 for β. For user's convenience, ICrAData depicts the intuitionistic fuzzy pairs in positive consonance in green color, the pairs in negative consonance in red, and all the remaining pairs, which are in dissonance, in magenta.

IV. RESULTS AND DISCUSSION
At the step of development and validation [18], [19], [20], AMMOS has been tested and validated over several protein-ligand complexes with binding pockets varying in their physicochemical properties and topologies. AMMOS has been shown to improve the enrichment after the docking stage with 40 to 60% of the initially added active compounds found in the top 3% to 5% of the entire compound collection.
For the target proteins considered in this study, the focused libraries constructed following the steps described in the section Dataset preparation contain about 30% of the initial library for ERα, while regarding NA -about 50%, due to its more open and larger binding pocket. As a result of the shape-based filtering procedure, six of the initially added ten active compounds were retrieved in the resulting focused libraries for each of the proteins of interest, ERα and NA.
For the purposes of this pilot study in terms of investigation of the capacity of ICrA to assess the AMMOS outcomes, only the known active ligands of the target proteins ERα and NA have been used in the analysis.
ICrA has been applied on the obtained binding energies of each ligand to the target proteins ERα and NA, respectively from docking with DOCK 6.0 and from the post-docking optimization performed via AMMOS. In regard to the ICrA terminology, the evaluated criteria are the results (binding energies) obtained after performed docking and post-docking optimization at different levels of protein flexibility in AMMOS, while the objects correspond to the chosen ligands from the evaluated receptor-based database (in the considered case -known active ligands only).
A. ICrA-based assessment of the outcomes of AMMOS levels of protein flexibility for ERα Additionally, there is a pair (Case 2 and Case 4), that refers to the side chains flexibility with the highest degree of agreement, namely µ = 1, thus falling into the strong positive consonance. These results suggest that ICrA distinguishes correctly between flexibility and non-flexibility of the protein structure, however the different levels of flexibility are not discriminated in this analysis. Therefore, for screening purposes in order to save time during post-docking optimization, we may rely in terms of efficiency on the execution of AMMOS only for Case 4 (where atoms of the protein side chains inside a sphere around the ligand can move) which is the fastest one among discussed Cases 1 to 4. As such, a good consensus between depth and accuracy of the calculation, and resources for its implementation might be achieved.
The values in magenta correspond to the pairs of criteria in dissonance, in the case of ERα those are mainly the results from the docking, compared to those from the post-docking optimization with AMMOS. These results are somewhat presumptive and could be explained by the fact that postdocking optimization is expected to improve the docking results. Indeed, after applying AMMOS refinement lower binding energies, compared to docking, are denoted.
Only Case 5 falls into dissonance with the other AMMOS cases of protein flexibility. This observation might be explained by the fact that in Case 5 the protein atoms are kept rigid, while in the remaining four cases the entire protein or parts of it close to the binding site are allowed to be flexible. Therefore, such dissonance is somehow expected as well.
B. ICrA-based assessment of the outcomes of AMMOS levels of protein flexibility for NA Comparable results have been obtained for the second target protein investigated here, NA. ICrA approach, applied again to the docking performed by DOCK and the post-docking optimization by AMMOS, outlines a pair of criteria Case 1 -Case 2, that falls into strong positive consonance (µ ∈ (0.95, 1.00]). All remaining pairs of criteria are in positive consonance (µ ∈ (0.85, 0.95]), as depicted on Fig. 4.
Similar to the ERα, the results from docking for NA are in dissonance with all the results obtained after post-docking optimization by AMMOS, the reasons for which have already been explained above. In the case of NA, all five levels of protein flexibility, implemented in AMMOS, are in positive consonance (in green). No dissonance has been outlined for the outcomes from Case 5 (with rigid protein atoms).
Accordingly to the aforementioned results for NA, all cases of protein flexibility, implemented in AMMOS, could be considered operating in a similar manner, as assessed by ICrA. Comparable observations to the exposed here for ERα and NA have been reported for another investigated target protein, the coagulation factor X (FX). The detailed analysis of the results obtained for FX might be found in [26].
In general, the outcomes of AMMOS allow the users to perform additional analysis of the optimized protein-ligand complexes for the aims of drug discovery projects. The effectiveness of the different levels of protein flexibility, available in AMMOS, has been further analyzed in terms of interactions in the protein-ligand complexes, after docking and after AMMOS application [21]. As such, protein-ligand interactions (PLI) diagrams obtained by Ligand Interactions Tool of Molecular Operating Environment of Chemical Computing Group (MOE, https://www.chemcomp.com/) might be used. Fig. 5 illustrates the PLI identified in the X-ray Several specific interactions have been observed: two contacts through a water molecule with the residues Glu226 and GLU276 and four hydrogen bonding (HB) interactions with the residues Arg116, Arg150, Arg292 and Arg374. Fig. 6 illustrates PLI interactions of the best scored pose of FDI after docking (Fig. 6a) and after the application of AMMOS in Case 1 (Fig. 6b).
As seen from Fig. 6a, no specific interactions are recorded for the best pose of the ligand after docking. The picture changes when Case 1 of AMMOS has been run, with fully flexible protein and ligand. As seen from Fig. 6b, after the Case 1 minimization the original interactions in the X-ray are partially recovered (HBs with Arg116, Arg150) and new interactions (HB with Arg154 and pication with Arg150) become involved, thus improving the results after docking.

V. CONCLUSION
In this investigation, the capacity of ICrA has been studied as a supporting tool to assess the effect of applying different levels of protein flexibility in the post-docking optimization using AMMOS in terms of intercriteria pairwise comparison. ICrA has demonstrated capacity to distinguish between results obtained from rigid docking and those obtained at different levels of protein flexibility. This observation confirms that distinct information is retrieved when different levels of protein flexibility are allowed even on the stage of post-docking optimization of proteinligand complexes. All five levels of protein flexibility have been shown to achieve comparable results as assessed by ICrA, allowing for selection of appropriate level of flexibility when investigating the particular protein target for screening purposes. The observed interrelations between the studied cases might assist in offering efficient scenarios in the virtual screening tasks where computational time is key parameter -to use Cases 3 and 4 instead of the slower ones Cases 1 and 2, with no loss of valuable information. In general, the outcomes from ICrA are considered useful in optimizing time-consuming and costly processes in the virtual screening of bioactive compounds for the purposes of drug discovery.