Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Available online at http://www.pharm.chula.ac.

th/am2019/

The 35th International Annual Meeting in Pharmaceutical Sciences


& CU-MPU International Collaborative Research Conference

Miscibility classification of drug-carrier mixture for solid dispersion


using decision tree
Thunyasit Jannoy1, Oran Kittithreerapronchai2, Jittima Chatchawalsaisin1*
1 Department of Pharmaceutics and Industrial Pharmacy, Faculty of Pharmaceutical Sciences, Chulalongkorn
University, Bangkok 10330, Thailand
2 Department of Industrial Engineering, Faculty of Engineering, Chulalongkorn University Bangkok 10330, Thailand

* Corresponding Author: Tel. +66(0)2-218-8274.; E-mail address: jittima.c@chula.ac.th

Keywords: Data mining, Decision tree, Miscibility, Solid dispersion, Solubility parameter

Introduction
Solid dispersion is a disperse system designed to solve the dissolution problem of a poorly
water-soluble drug such as Class II drugs in biopharmaceutical classification system (BCS) that has
high permeability and low solubility properties. In the solid dispersion, the drug can either be miscible
or immiscible with the solid carrier. An appropriate physical state of the drug together with the carrier
that has hydrophilic character can help to improve drug dissolution rate and oral-bioavailability.
However, when the drug is immiscible with the carrier and stays in amorphous form, the dissolution rate
may be decreased during storage due to that the amorphous form is thermodynamically unstable and
tends to convert to a more stable crystalline form. The solid dispersion where the drug is miscible and
dispersed in the carrier at the molecular level is more desirable.1-3 To obtain this type of solid dispersion,
the carrier can be selected based on study of drug-carrier miscibility by experimentation such as
differential scanning calorimetry and film casting or by calculation of solubility parameter.4-6
The concept of solubility parameter was introduced by Hildebrand and Scott7 who suggested
that the solubility parameter (δ) is related to cohesive energy density (CED). To describe the complex
molecules of drug and carrier, Hansen8 modified Hildebrand’s solubility parameter to that the total
cohesive energy density is derived from the sum of three energy types of dispersion force (δd), dipole-
dipole interaction (δp), and hydrogen bond (δh). The difference between the solubility parameter of a
drug and a carrier (∆δ) can be used for prediction of drug-carrier miscibility. As a rule of thumb, the
criteria of ∆δ < 7.0 MPa0.5 suggests that a drug and a carrier should be miscible, whereas the criteria of
∆δ > 10.0 MPa0.5 implies that a drug and a carrier are immiscible. These criteria are inclusive if ∆δ is
between 7.0 to 10.0 MPa0.5 as it fails to predict miscibility of drug and carrier.9, 10
The advents of data collection and computational techniques have greatly expanded scientific
knowledge. One of the most useful computational techniques is data mining defined as the process of
discovering patterns in data and lead to some advantage, particularly predicting an event that might
occur.11 The application of data mining into any classification problem requires two stages. First, a
classification model based on a specific computer algorithm must be created using some available data,
called the training data set. Second, the model is cross-validated with another set of data, called the
testing data set, and the accuracy of the model is calculated. These two stages ensure unbiased
performance as well as preventing the over-fitting of the model.12 Data mining has been applied to
classify the miscibility data of drug and carrier by Allhalaweh, et al.13 They applied k-mean algorithm to
classify the miscibility data of indomethacin and other compounds based on solubility parameters.
However, k-mean is an unsupervised clustering algorithm designed to separate data set into distinct
groups. This may or may not be associated with miscible or immiscible which is the desired attribute of
classification. Furthermore, the algorithm requires a specific number of distinct groups beforehand.
Such prediction of miscibility is more suitable for the supervised classification algorithm.
As one of basic computer algorithm for supervised classification, decision tree algorithm
produces a flowchart-like tree structure, where each internal node denotes a condition on a factor and

135 | IAMPS 35 & CU-MPU International Collaborative Research Conference


each branch represents an outcome. The last nodes represent the desired attribute or class. The
advantages of decision tree are speed and easy to understand.12
The study applies decision tree algorithm for classification of miscibility data of two drugs, i.e.
indomethacin and lacidipine, and several carriers used for preparing solid dispersions. The
classification based on the solubility parameters and other additional characters, i.e. molecular weight
of drugs and carriers, and drug to carrier ratios.

Methodology
This study follows the data mining methodology and its application to industrial procedures proposed
by Rohanizadeh and Moghadam.14
Formulated goals and objectives: This study aims to classify drug-carrier miscibility results which
might be affected by solubility parameters and molecular weights of drugs and carriers, as well as drug
to carrier ratios used for preparing solid dispersions.
Data Gathering: The input data was mainly collected from the result of miscibility study on the solid
dispersion reported by Forster et al.5 This included miscibility data of two drugs, indomethacin and
lacidipine, and eleven carriers with three different ratios of drug-carrier. Molecular weights of drugs and
carriers were gathered from PubChem database.15 Accordingly, the total of 66 records consisted of
seven attributes: 1) solubility parameter of drugs (MPa0.5); 2) solubility parameter of carriers (MPa0.5);
3) differences between the solubility parameter of drugs and carriers (MPa0.5); 4) drug and carrier ratios;
5) molecular weights of drugs (g/mol); 6) molecular weights of carriers (g/mol); and 7) miscibility results
of the drug and carrier as the desired attribute. The miscibility result was classified as “y” for miscible
and “n” for immiscible solid dispersions.
Data Cleansing: The data were checked for any duplicating values and missing values. There was
no duplicate and missing data.
Data Exploration: The underlying patterns were discovered using data visualization with standard
plots, such as a scatter plot and box-whisker plot. In addition to pattern, these plots also confirmed
assumptions related to models. The data were randomly divided into two data sets 1) training data set
that contains 70% of data and 2) testing data set consisting of the remaining 30% of the data.
Model Development: Based on the objective and data pattern, decision tree algorithm was selected
and applied using RapidMiner Studio software16 to classify for prediction the drug-carrier miscibility.
Furthermore, Gini index was used as an attribute selection measure for selecting the best splitting
criterion to separate data. Gini index can be computed as shown in the following equation.
m
𝐺ini(D) = 1 − ∑ p2i
i=1
Where Pi denotes the probability of data in node D belongs to class Ci of total m available classes.
Minimum Gini index of possible splitting attribute was selected as the splitting attribute.12
Model validation: Having developed the decision tree model using the training data set, the testing
data set was used to validate the performance of the model and compared with their true results.

Results and discussion


During the data exploration, the relationship of the difference between solubility parameter of drug and
carrier (∆δ) and the drug-carrier ratio is visualized using a scatter plot as shown in Figure 1.

Figure 1 The difference between solubility parameters of drug and carrier (∆δ) and drug-carrier ratios

136 | IAMPS 35 & CU-MPU International Collaborative Research Conference


The Figure 1 reveals that the values of ∆δ imply the miscible and immiscible solid dispersions.
If the difference between solubility parameters is small (0.0≤∆δ ≤ 2.0), the drug and carrier are miscible,
especially at high drug-carrier ratios. On the contrary, the large difference between solubility parameters
suggests immiscible solid dispersion with the exception of PVA (∆δ = 8.90). Unlike other carriers, PVA
has partially crystalline nature, and the miscibility can occur with its amorphous part only. Because of
this property, PVA is considered as an outlier data and removed from the data set.
The training data set (35 records) is used to construct the decision tree model as shown in
Figure 2.

Figure 2 Proposed decision tree model, where Classes y and n represent miscibility and immiscibility
between drug and carrier, respectively. Delta solubility parameter is the difference between
the solubility parameter of drug and carrier

The model reveals that the majority of drug-carrier miscibility can be filtered by a high value of
differences between solubility parameters. A low value of differences between solubility parameters and
high drug to carrier ratio tends to be miscible. These observations are consistent with other previous
studies.9,10 However, this model shows lower cut-off point for miscibility at ∆δ < 3.6 MPa0.5. The model
also suggests that molecular weight of carrier may play some role in the classification. The testing data
set (18 records) is incorporated into the model to predict the desired attribute and to evaluate the
performance. The predicted results are then compared with their true value and shown as a confusion
matrix in Table 1.

Table 1 Performance of miscibility prediction

True True class precision


miscible immiscible
Predicted miscible 8 0 100%
Predicted immiscible 0 10 100%
Overall precision 100%

Despite a relatively small testing data set, the confusion matrix implies that the model can
correctly predict drug-carrier miscibility or 100% overall precision. It is important to note that the
condition related to the molecular weight of carrier in the model remains inconclusive. The last two
conditions of the tree diagram model suggest a possibility of overfitting and removing such conditions
lead to 100% overall precision when test with same data testing and 96.83% overall precision when
test with all data. As a result, additional data set and correlation analysis are required to determine
whether the carrier molecular weight plays any role in the classification of drug-carrier miscibility or not.

137 | IAMPS 35 & CU-MPU International Collaborative Research Conference


However as compared with Allhalaweh, et al 13 studied, the results are consistent with K-mean algorithm
that solubility parameter plays major role for miscibility classification.

Conclusion
The results show that decision tree can be applied for the classification of drug-carrier miscibility
to save cost and time during the preliminary assessment of the solid dispersion. However, the accuracy
of the model depends on the quality of the data. Further studies are required to assess on more data
set and more factors to construct suitable prediction model.

References
1. Bikiaris DN. Solid dispersions, part I: Recent evolutions and future opportunities in manufacturing methods for
dissolution rate enhancement of poorly water-soluble drugs. Expert Opin Drug Deliv. 2011;8(11):1501-19.
2. Lu Y, Tang N, Lian R, Qi J, Wu W. Understanding the relationship between wettability and dissolution of solid
dispersion. Int J Pharm. 2014;465(1-2):25-31.
3. Chiou WL, Riegelman S. Pharmaceutical applications of solid dispersion systems. J Pharm Sci.
1971;60(9):1281-302.
4. Parikh T, Gupta SS, Meena AK, Vitez I, Mahajan N, Serajuddin AT. Application of film-casting technique to
investigate drug-polymer miscibility in solid dispersion and hot-melt extrudate. J Pharm Sci. 2015;104(7):2142-
52.
5. Forster A, Hempenstall J, Tucker I, Rades T. Selection of excipients for melt extrusion with two poorly water-
soluble drugs by solubility parameter calculation and thermal analysis. Int J Pharm. 2001;226:147-61.
6. Baird JA, Taylor LS. Evaluation of amorphous solid dispersion properties using thermal analysis techniques.
Adv Drug Deliv Rev. 2012;64(5):396-421.
7. Hildebrand J, Scott R. The solubility of nonelectrolytes. New York: Reinhold Pub. Corp; 1950. p. 514.
8. Hansen C. Hansen solubility parameters: a user’s handbook, 2nd ed. Boca Raton: CRC Press; 2007.
9. Greenhalgh DJ, Williams AC, Timmins P, York P. Solubility parameters as predictors of miscibility in solid
dispersions. J Pharm Sci. 1999;88(11):1182-90.
10. Kitak T, Dumicic A, Planinsek O, Sibanc R, Srcic S. Determination of solubility parameters of ibuprofen and
ibuprofen lysinate. Molecules. 2015;20(12):21549-68.
11. Witten IH, Frank E, Hall MA, Pal CJ. Data mining practical machine learning tools and techniques. 4th ed. San
Francisco: Morgan Kaufmann; 2017. p. 3-41.
12. Han J, Kamber M, Pei J. Data mining: concepts and techniques. 3rd ed. San Francisco: Morgan Kaufmann;
2012. P. 327-91.
13. Alhalaweh A, Alzghoul A, Kaialy W. Data mining of solubility parameters for computational prediction of drug-
excipient miscibility. Drug Dev Ind Pharm. 2014;40(7):904-9.
14. Rohanizadeh SS, Moghadam MB. A proposed data mining methodology and its application to industrial
procedures. J Ind Eng. 2009;4:37-50.
15. PubChem [Internet]. USA: National Center for Biotechnology Information; c 2004-19. [cited 2019 Jan 11].
Available from: https://pubchem.ncbi.nlm.nih.gov/.
16. Rapidminer GMBH. Rapidminer studio [Computer software]. Version 9.1.000 (trial edition); 2018. Available
from: https://rapidminer.com/.

138 | IAMPS 35 & CU-MPU International Collaborative Research Conference

You might also like