Flax New

Physical Interpretation of
Pedro M. Milani1
Mechanical Engineering Department,
Stanford University,
Machine Learning Models
Stanford, CA 94305
Applied to Film Cooling Flows
Downloaded from http://asmedigitalcollection.asme.org/manufacturingscience/article-pdf/141/1/011004/6309973/turbo_141_01_011004.pdf by Prasad V. Potluri Siddhartha Institute Of Technology user on 08 November 2023
e-mail: pmmilani@stanford.edu
Julia Ling Current turbulent heat flux models fail to predict accurate temperature distributions in
Principal Scientist, film cooling flows. The present paper focuses on a machine learning (ML) approach to
Citrine Informatics, this problem, in which the gradient diffusion hypothesis (GDH) is used in conjunction
Redwood City, CA 94063 with a data-driven prediction for the turbulent diffusivity field at. An overview of the
model is presented, followed by validation against two film cooling datasets. Despite
John K. Eaton insufficiencies, the model shows some improvement in the near-injection region. The
Mechanical Engineering Department, present work also attempts to interpret the complex ML decision process, by analyzing
Stanford University, the model features and determining their importance. These results show that the model
Stanford, CA 94305 is heavily reliant of distance to the wall d and eddy viscosity t, while other features
display localized prominence. [DOI: 10.1115/1.4041291]
1 Introduction flux and the mean scalar gradient via a turbulent diffusivity at, as
shown in Eq. (2). To specify at everywhere in the domain, solvers
Modern gas turbine blades must operate under high-
typically employ the Reynolds analogy by fixing a turbulent
temperature conditions and need to be cooled to meet their
Prandtl number, usually Prt ¼ 0.85 [7]. Then, at is found as a func-
required lifespan. Film cooling is a technique widely employed
tion of the eddy viscosity field, at ¼ t/Prt
for this purpose [1]. To determine the effectiveness of such cool-
ing configurations, Reynolds-averaged Navier–Stokes (RANS) !
simulations usually are employed in the design cycle due to their @ @ @h @ 0 0
ui h ¼ a uh (1)
much lower computational cost when compared to higher fidelity @xi @xi @xi @xi i
large eddy simulations (LES) and direct numerical simulations
(DNS). However, RANS simulations must model all turbulent @h
scales, and current state-of-the-art models are well known to pro- u0i h0 ¼ at (2)
vide poor predictions of quantities of interest, especially the mean @xi
temperature. Therefore, the turbomachinery community would
greatly benefit from transport models better suited to compute the Previous research has shown that this approach does not capture
mean temperature field in film cooling flows. the turbulent scalar flux adequately. Recent experimental work by
Many closure models have been proposed for the momentum Schreivogel et al. [8] investigated inclined jets in crossflow with
equation. Those include the two equation k–e [2] and k–x [3] and without trenches and found misalignment between u0i h0 and
models, with several subsequent modifications and improvements ð@h=@xi Þ, including regions of counter-gradient transport. Kohli
(e.g., in Shih et al. [4]). Hoda and Acharya [5] applied several of and Bogard [9] used experimental data to estimate the turbulent
these models to the problem of predicting the velocity field in a Prandlt number in a film cooling configuration and found a spa-
film cooling flow and found that none produced excellent agree- tially varying field, roughly between 0.5 and 2.
ment. However, the focus of the present work is the closure of the Literature does not present many alternatives to the simple
temperature equation for two reasons: first, because work by Ling GDH of Eq. (2) with a fixed Prt. The two other commonly cited
et al. [6] showed that modeling errors in the turbulent heat flux models are the generalized gradient diffusion hypothesis (GGDH)
contribute significantly to the error in the mean temperature field of Daly and Harlow [10] and the higher order generalized gradient
and second, because this is a relatively unexplored field; modeling diffusion hypothesis (HOGGDH) of Abe and Suga [11]. Like the
options for the temperature equation are much more limited than GDH, both need the tuning of a model constant; unlike the GDH,
they are for the momentum equation. they also require the full Reynolds stress tensor u0i u0j . Ling et al.
For the purposes of this paper, the flow is considered incom- [6] found that indeed the GGDH and HOGGDH could represent
pressible and temperature is assumed to behave as a passive the turbulent scalar flux slightly better in a film cooling configura-
scalar, which is justified when advective transport is much stron- tion. However, these models struggled with numerical instabil-
ger than buoyancy. Then, the scalar transport equation for a statis- ities, and the authors concluded that appropriate tuning of model
tically stationary flow is given by Eq. (1), where h indicates the parameters is more important than switching to a more complex
normalized temperature (the passive scalar), overbars indicate model. That led Milani et al. [12] to propose a novel approach to
time average, and a is the molecular diffusivity. The unclosed this problem: use the GDH of Eq. (2) but prescribe the turbulent
term, which needs a model, is the turbulent scalar flux u0i h0 . The diffusivity at using a machine learning (ML) algorithm.
simplest and most widely employed model is the gradient diffu- Machine learning is a framework where an algorithm processes
sion hypothesis (GDH), which linearly relates the turbulent scalar large amounts of data and uses that data to build models and make
predictions. There is a small but growing community trying to
apply these methods to turbulence modeling problems. Ling et al.
1
Corresponding author. [13] trained deep neural networks to predict the RANS anisotropy
Contributed by the International Gas Turbine Institute (IGTI) of ASME for
publication in the JOURNAL OF TURBOMACHINERY. Manuscript received August 16,
tensor. Weatheritt et al. [14] used symbolic regression to predict
2018; final manuscript received August 22, 2018; published online October 17, 2018. nonlinear relationships between stress and strain in the wake
Editor: Kenneth Hall. behind a high-pressure turbine blade. Duraisamy et al. [15]
Journal of Turbomachinery Copyright V

C 2019 by ASME JANUARY 2019, Vol. 141 / 011004-1
employed machine learning as a step in their proposed methodol- explanation of the RF model. Milani et al. [12] demonstrated that
ogy for improving closure models. In many of these papers, Ntrees ¼ 1000 is sufficient to provide good predictive accuracy for
machine learning is found to be a promising tool to produce this class of problems.
improved models. However, these models tend to be difficult to Since a supervised learning approach is taken, the model
interpret, particularly when complex algorithms such as deep neu- requires training data to fit the function f, for which the true at is
ral networks and random forests (RF) are used. known. To generate this, one can leverage DNS/LES of the same
Explaining the predictions of a machine learning model can be geometries where RANS data are available. In these high-fidelity
extremely useful for a few different reasons. It can help one datasets, the quantity u0i h0 is directly available, together with the
diagnose misleading models or improve the models by highlight- mean scalar gradient ð@h=@xi Þ. Then, a turbulent diffusivity at,LES
ing their weaknesses. In physical simulations, interpretation of is found as a least-squares fit to Eq. (2). This field is interpolated
machine learning models can provide insights on the underlying onto the RANS mesh, and the resulting quantity is assumed to be
physics and thus inspire scientists to develop new physics-based the true diffusivity that is used to train the random forest.
models. Model interpretation is a challenging task and an open
question in the broader machine learning community. One of the
many attempts to address this is by Ribeiro et al. [16], who recog-
nizes the importance of explaining ML models and develops a 2.2 Datasets. In the present work, four distinct geometries
local explanation technique. In the turbulence community, Wu were studied. They consist of three different film cooling configu-
et al. [17] applied dimensionality reduction to visualize features in rations, and a separated flow around a wall-mounted cube. In all
high dimensional space, which could be useful to generate of them, a RANS simulation was performed using the commercial
insights if coupled with a machine learning model. software Fluent, with the realizable k–e model of Shih et al. [4]
To address this, the current paper analyzes the model used by and enhanced wall treatment. For the initial RANS calculations,
Milani et al. [12] to obtain improved results in film cooling flows the GDH with Prt ¼ 0.85 was used in the scalar transport equation.
in an attempt to gain insights on the decisions it makes. In Sec. 2, In all cases, a fine mesh near the wall was used (first cell at
a brief overview of the model is presented. Section 3 presents vali- yþ 1), and a mesh convergence study showed satisfactory
dation results on two different datasets. In Sec. 4, the importance results. Also, a second-order upwind scheme was used for spatial
of different features is determined and then used for model inter- discretization. In three of the cases, a DNS or high-resolution LES
pretation. Finally, Sec. 5 discusses conclusions and suggestions that has been well-validated against experimental data is also
for future work. available (see more information in Milani et al. [12]); for one
case, 3D mean scalar concentration measurements obtained using
magnetic resonance concentration (MRC) are available for
2 Machine Learning Methodology validation.
2.1 Algorithm. The idea behind the model first presented in The Baseline geometry consists of a single circular film cooling
Milani et al. [12] is to use the GDH as a simple closure to Eq. (1) hole, discharging at a 30 deg inclination angle into a square main-
coupled with a turbulent diffusivity field that is predicted by a flow channel of side 8.6D (D is the hole diameter). The film cool-
machine learning algorithm, at,ML. This field is assumed to be a ing hole has a length L/D ¼ 4.1. The blowing ratio, BR, and
function of local, properly nondimensionalized RANS variables: density ratio, DR, are both equal to 1. The incoming boundary
the mean velocity gradient, ru, the mean scalar gradient as calcu- layer, measured 2D before injection, has a thickness of d/D ¼ 1.4.
lated by the Prt ¼ 0.85 model, rh, the distance to the nearest wall, The coolant flows from a plenum underneath the channel, which
d, and the eddy viscosity, t. The quantity at,ML is a scalar, and is fed from the spanwise direction, and contains scalar concentra-
tion h ¼ 1. The concentration in the main channel, upstream of
therefore must be Galilean invariant, but ru and rh change injection, is h ¼ 0. Figure 1(a) shows a spanwise plane at the cen-
under different reference frames. To address this inconsistency, ter of the channel for this geometry. The RANS simulation of the
the methodology proposed in Ling et al. [18] is followed to obtain
invariant bases: a total of 19 features are extracted, {/1, /2,…,
/19} (see Sec. 4 for a complete list). A machine learning model is
then used at each computational cell of the RANS domain to map
between these 19 features and the local turbulent diffusivity:
at,ML ¼ f(/1, /2,…, /19). Note that the selection of the 19 features
is a design choice, made once at the start of the current work.
They depend exclusively on the results of a k–e RANS
simulation—as soon as one is run in any geometry, the features
are readily available in each computational cell of the domain.
The ML step consists of the choice of the functional form of f
and its calibration against data. At training time, the function f is
calibrated to reproduce as well as possible the mapping from the
features to at in the training datasets. At test time, the function f is
used to assign a value of at to each cell of the domain of test data-
sets (distinct from the training ones), given the 19 features avail-
able from RANS.
There are several different machine learning tools to perform
this regression task (i.e., different forms of f). For this research, a
RF with Ntrees ¼ 1000 was chosen and implemented in Python
using the SCIKIT-LEARN package [19]. The work of Ling and Tem-
pleton [20] showed that RFs, despite being relatively easy to
implement and tune, give good accuracy for turbulence modeling
applications. A random forest is an ensemble of Ntrees distinct
binary decision trees, each constructed with a different training
set—consisting of points that are randomly sampled from the total Fig. 1 Center spanwise plane showing contours of h calcu-
training data. The overall decision of the forest is an average of lated by the RANS simulation with Prt 5 0.85 in (a) the baseline
each individual tree’s prediction. See Sec. 4.1 for a more in-depth case and (b) the FPG case. BR 5 1 and DR 5 1 for both cases.
011004-2 / Vol. 141, JANUARY 2019 Transactions of the ASME

Fig. 3 Center spanwise plane of the Cube geometry, with con-
tours of h taken from the DNS
Table 1 Summary of the datasets
Case Description ReD Use
Baseline Inclined jet in crossflow 3000 Test

FPG Inclined jet in FPG crossflow 3400 Test
Skewed Compound-angle jet in crossflow 5800 Train
Cube Wall-mounted cube in crossflow 5000 Train
Fig. 2 Schematic showing the film cooling hole in the Skewed
case, adapted from Folkersma and Bodart [24]. The first image
shows the top view, and the second shows the side view. designated as training or testing based on the availability of high
Lengths are nondimensionalized by the hole diameter. fidelity LES/DNS data: in order to be used in the training set, high
fidelity simulation data are required because the u0i h0 field is
Baseline case is described in Coletti et al. [21] and the LES comes needed. Testing is possible without it, as long as there are experi-
from Bodart et al. [22]. mental data against which the mean scalar field can be validated.
The other two film cooling geometries are based on the Base- Note that the range of Reynolds numbers studied is limited by
line configuration, but slightly modified to alter some flow charac- the availability of reliable high-fidelity simulations and 3D experi-
teristics. One of them has a significant favorable pressure gradient mental data (since neither DNS/LES or MRC are possible in
(FPG) in the main flow: the flow goes through an initially square much higher Re). This is a limitation of the current work—and in
channel that contracts in the vertical direction, causing it to the future, it would be interesting to study how well machine
accelerate. The blowing ratio (defined as the bulk velocity in the learning models calibrated in low Re flow generalize to higher
cooling hole divided by the bulk velocity of the main flow at the Reynolds number.
injection location) and density ratio are also one, BR ¼ DR ¼ 1.
The incoming boundary layer thickness is smaller than in the 3 Machine Learning Results
Baseline case, as expected (d/D ¼ 1.0 measured 2D before injec-
tion). The cooling hole and the plenum are identical to the Base- In this section, the machine learning model is compared against
line case. High-fidelity simulation data are not available; instead, a traditional RANS model in the Baseline and FPG cases. Valida-
3D mean concentration data obtained through the MRC technique tion is performed by comparing the computed mean scalar fields
by Coletti et al. [23] are used for model validation. The aforemen- to the known answer from high-fidelity simulations or experi-
tioned reference also contains more details about this geometry. ments. Those fields are obtained by solving the scalar transport
The third film cooling configuration is the Skewed case. It con- equation using distinct diffusivity fields: the regular at,RANS,
sists of a channel with a single compound-angle injection hole: obtained from Prt ¼ 0.85; the machine-learned diffusivity, at,ML;
the hole is inclined 30 deg in the streamwise direction and 30 deg and the diffusivity extracted from the LES, at,LES (only in the
in the spanwise direction (see Fig. 2). The channel is rectangular, Baseline case, because it is not available in the FPG case).
8.6D in height and 17.3D in width (it is twice as wide to accom- In all cases, the scalar transport equation is solved with a fixed
modate the lateral motion of the jet). The incoming boundary velocity field, obtained from the original RANS. This is a limita-
layer has thickness d/D ¼ 1.9, measured 2D upstream of injection. tion of the present approach, since the only correction proposed is
The blowing ratio and density ratio are both 1. The film cooling in the turbulent scalar flux model—the traditional RANS velocity
hole has the same length L/D ¼ 4.1 as before. The RANS field, which is not perfectly accurate, is still employed in the sca-
simulation is described in Milani et al. [12] and the LES is given lar equation. This is justified since multiple studies have shown
in Folkersma and Bodart [24]. The latter source also provides that the traditional models for turbulent mixing work poorly in
much more detail about the geometry and the resulting flow. film cooling configurations (recent examples include the work of
The fourth geometry used is a wall-mounted cube in crossflow, Schreivogel et al. [8] and Oliver et al. [26]), so even an attempt to
described in the work of Rossi et al. [25]. It consists of a flow near fix only them is still valuable. Using the LES velocity field is pos-
a wall that encounters a cube of side D. There is a small region on sible to isolate the errors due to the mixing model; however, this
top of the cube that is kept at a constant temperature h ¼ 1, which is not realistic for a standalone RANS solver, where LES veloc-
generates a nontrivial scalar distribution. Rossi et al. [25] presents ities are not available. If the ML model is to be used in practice, it
both a RANS simulation and a DNS of that geometry; a contour must be used with a RANS velocity field.
plot of the results can be found in Fig. 3. While this is not a film
cooling flow, it was included in the training database because it 3.1 Baseline Results. In this subsection, the model is applied
includes key features that would be found in a film cooling flow, to the Baseline geometry. The current results differ from the ones
including 3D flow separation and a free shear layer. presented in Milani et al. [12] for two reasons. First, a better
Table 1 shows a summary of all the four datasets utilized, numerical scheme was used in the present work to solve the scalar
including their respective Reynolds number based on D (hole equation (the different diffusivities were input into Fluent, and the
diameter or cube length) and either coolant jet bulk velocity (base- scalar equation was solved in the same converged mesh used for
line, FPG, skewed) or free-stream velocity (cube). For this work, the original RANS, with second-order accurate methods). Second,
the model was trained in the Skewed and Cube datasets, and then the field at,RANS is defined presently as t/0.85, where t is the
tested on the Baseline and FPG cases. The datasets were eddy viscosity from the realizable k–e model; in Milani et al., it is
Journal of Turbomachinery JANUARY 2019, Vol. 141 / 011004-3

Fig. 4 Mean scalar field in the Baseline geometry. The left panels show wall-normal planes at the wall (Y/D 5 0), equivalent to
adiabatic effectiveness. The right panels show streamwise planes at X/D 5 2 and X/D 5 5, respectively. Contour lines are
shown at h 5 0:75, 0.5, 0.25. (a) has the LES field, (b)–(d) contain the mean scalar field calculated using different turbulent dif-
fusivity fields.
defined as ð0:09k2 =e=0:85Þ, where the numerator is the eddy vis- the near-injection region, which causes the scalar contained in the
cosity from the standard k–e model. This difference in models core to diffuse more quickly toward the wall and particularly
makes the Prt ¼ 0.85 results reported next significantly more accu- toward the free-stream.
rate than the ones from Milani et al. close to the wall, but less so Figure 4(c) presents the results from using the machine learning
in the shear layer between the jet and the free-stream. turbulent diffusivity. It shows some qualitative improvement over
Figure 4 shows contour plots of the scalar concentration in the the Prt ¼ 0.85 model, but it also faces difficulties. It predicts
Baseline geometry. Figure 4(a) displays the LES results and a few the detachment of the jet core better, as evidenced by the nonmo-
interesting features should be highlighted. From the wall plot, it is notonic behavior of the scalar concentration at the wall, but it
clear that the adiabatic effectiveness is not monotonic: at a blow- underpredicts the decay of the adiabatic effectiveness further
ing ratio of 1, the core of the jet is detached from the wall, leading downstream, causing the h ¼ 0:25 contour to end about 2.5D far-
to low values of scalar concentration immediately after injection. ther in the wall-normal plane. In the streamwise planes, the peak
The concentration then recovers after reattachment at around concentration at the jet core is now overpredicted, both at X/D ¼ 2
X/D ¼ 4, and falls slowly afterward. In the spanwise planes, a dis- and X/D ¼ 5. The lateral spread of the kidney-shaped region is
tinctive kidney shape is observed at X/D ¼ 2 due to the presence more accurately captured and its features are sharper and more
of a strong counter-rotating vortex pair after injection. Note that accurate (the curvature at the bottom and the y-gradient at the
since the LES results were thoroughly validated against experi- top), but the model now overpredicts the total extent of the region
mental data [22], these results are treated as the ground truth that of high scalar concentration. A qualitative comparison between
the RANS models should be able to capture. Figs. 4(c) and 4(d) shows an important result: the scalar concen-
The calculations with the traditional RANS model, using tration predicted by the machine learning algorithm is very similar
Prt ¼ 0.85, are shown in Fig. 4(b). The detachment mentioned to the one predicted by directly using the turbulent diffusivity
before is not well captured, since the wall normal plane shows a from the LES, at,LES. This shows that the random forest is per-
continuous h ¼ 0:25 contour. Regarding the streamwise planes, it forming well at predicting what it was trained to predict (a field
is possible to observe the kidney shape at X/D ¼ 2. However, the at,ML that looks as close as possible to at,LES); the shortcomings in
high concentration jet core is located closer to the wall (another the scalar field, then, arise due to weaknesses in the model,
evidence that the detachment is not accurately captured) and with because even with a perfect diffusivity, there is only so much of
a much smoother gradient in the windward side. Also, the calcula- the real physics one can capture with the gradient diffusion
tions underestimate the peak scalar concentration and the lateral hypothesis.
spread of the jet core, which can be seen at X/D ¼ 5 as well. These To compare the results more quantitatively, Fig. 5 shows two
problems arise due to high diffusivity at assigned by the model in lines plots at crucial locations: one at the center of the channel at

Fig. 5 Line plots of h in the Baseline geometry. (a) shows wall-normal variation at Z/D 5 0 and
X/D 5 2; (b) shows streamwise variation at Z/D 5 0 and Y/D 5 0. The top-right corner of plot (b)
shows a zoomed-in version of itself close to injection.
Table 2 Error in the Baseline case using distinct diffusivity

the X/D ¼ 2 location, showing how h changes in the wall-normal fields
direction (Fig. 5(a)), and one showing how it evolves in the
streamwise direction at the wall (Fig. 5(b)). Note that the latter is Region of interest (baseline)
equivalent to the centerline adiabatic effectiveness. In Fig. 5(a), it
is clear that the Prt ¼ 0.85 solution underestimates the peak con- Total Injection Wall
centration because the scalar at the center to the jet diffuses too
X/D: –1 to 19 X/D: –1 to 4 X/D: 1 to 19
much in the y-direction, leading to a smoother gradient at the top Y/D: 0 to 2 Y/D: 0 to 2 Y/D: 0 to 0.1
and a higher concentration at the wall. The machine learning Z/D: –1 to 1 Z/D: –1 to 1 Z/D: –1 to 1
results, in turn, overestimate this peak concentration; but the ML
model performs much better at the top of the jet, where the
RANS 0.0228 0.0278 0.0220
y-gradient is well captured, and in computing the value of h at the ML 0.0222 0.0188 0.0228
wall (at X/D ¼ 2, the jet is still detached). In Fig. 5(b), the nonmo- LES 0.0170 0.0159 0.0202
notonic behavior is obvious for the LES results of Bodart et al.
[22]: a dip happens at around X/D ¼ 2, but the concentration
recovers at around X/D ¼ 4. The machine learning field reprodu-
3.2 Favorable Pressure Gradient Results. Next, the machine
ces this better qualitatively but misses the exact numbers; the
Prt ¼ 0.85 curve displays a qualitatively incorrect trend. At higher learning model is applied to make predictions in the FPG dataset.
The exact same random forest used in Sec. 3.1 (which was calibrated
distances from injection (X/D > 10), all models capture the results
in the Skewed and Cube datasets) is employed, and predictions are
relatively well, with the machine learning model slightly overesti-
based on the RANS features of the FPG dataset. Then, the scalar
mating h. fields are computed with the RANS velocity field and either at,RANS
Finally, integral error metrics can be defined to compare 3D or at,ML. The validation is done by comparing those computed scalar
fields point-by-point over different regions of the flow. Equation fields with the 3D MRC data of Coletti et al. [23]. There are two
(3) shows the error metric used, where the sum is performed over important points to note in this comparison: first, due to partial vol-
all cells of a specified region of interest ume effects, the experimental data are not very accurate close to the
X wall, so comparisons are made starting at Y/D ¼ 0.15 (i.e.,
jhi h i;LES jVi 0.15D ¼ 0.9 mm above the wall). Second, there is experimental
i
error ¼ X (3) uncertainty in the h measurements at each cell of around 0.05.
Vi Figure 6 shows the contour plots in the FPG geometry. In
i Fig. 6(a), the experimental contours are shown: on the left, a wall-
normal plane at the closest location to the wall allowed by the
hi is the scalar calculated with one of the different diffusivities at experiment, which is a “near-wall” adiabatic effectiveness; and
the ith cell, while h i;LES is the concentration from the LES. Vi is on the right, streamwise planes at the same locations as before
the volume of the ith cell. Note that this is a generalized form of (X/D ¼ 2 and X/D ¼ 5). Many of the qualitative features described
the metric used in Milani et al. [12], which weights the error in in the Baseline case are also observed here, like the kidney shape
each cell by the cell’s volume, and thus is more appropriate for and the separation followed by reattachment. Some notable differ-
unstructured meshes. The results of applying the error metrics can ences can be seen at the X/D ¼ 2 streamwise location. Because of
be found in Table 2. These numbers show that using the machine the accelerating flow and thinner incoming boundary layer, the
learning diffusivity is about as good as using the Prt ¼ 0.85 closure high-concentration jet core is pushed toward the bottom wall and
in the overall domain and very near the wall. However, the accu- spreads more laterally when compared to the Baseline geometry.
racy in the injection region, which is where most of the interesting Comparing Figs. 6(b) and 6(c) to Fig. 6(a) tells a very similar
physics happen, can be significantly improved by using the pro- story to the comparisons made in the Baseline case. The RANS
posed ML model. Note that using the LES diffusivity has, in gen- diffusivity predicts a much less drastic separation of the jet core,
eral, the smallest errors as expected. However, as Figs. 4 and 5 while the machine learning diffusivity performs better at that. The
showed, the ML model matches the LES results well in most of former underpredicts the peak concentration at X/D ¼ 2 and
the interesting regions of the flow. X/D ¼ 5 due to high vertical diffusion, particularly toward the

Fig. 6 Mean scalar field in the FPG case. The left panels show wall-normal planes at a near-wall location (Y/D 5 0.15). The
right panels show streamwise planes at X/D 5 2 and X/D 5 5, respectively. Contour lines are shown at h 5 0:75, 0.5, 0.25. (a) has
the MRC field from Coletti et al. [23], (b) and (c) contain the mean scalar field calculated using different turbulent diffusivity
fields.
free-stream, while the latter overpredicts it slightly. Both RANS Table 3 has the error metrics calculated in the FPG case using
models predict the jet core lower than what is shown in the Eq. (3). Note that to apply that cell-by-cell difference, the MRC
experiment. data (available in a uniform Cartesian mesh) are linearly interpo-
The same quantitative tools used in Sec. 3.1 can also be lated onto the RANS mesh. The numbers show, once again, that
employed here. Line plots showing vertical variation at a fixed the models have somewhat similar results in the full region. In
X/D ¼ 2 location and streamwise variation of the near-wall effec- the near-wall region, between Y/D ¼ 0.15 and Y/D ¼ 0.25, the
tiveness are shown in Fig. 7. Figure 7(a) makes it clear that the Prt ¼ 0.85 performed better. Finally, as seen before, the ML model
ML diffusivity slightly overpredicts the peak concentration, while obtained better results in the near injection region as a whole,
the Prt ¼ 0.85 model underpredicts it. The ML model captures a though the difference was not as significant as the improvement
sharper, more accurate concentration gradient above the peak, and seen in Table 2.
does a slightly better job toward the wall. In the near-wall effec- Overall, as this comparison has shown, the same machine learn-
tiveness of Fig. 7(b), neither model captures the detachment well ing model obtained similar results in both datasets, the Baseline
between X/D ¼ 2 and X/D ¼ 4, but the ML diffusivity gets closer; and the FPG. Its comparison to the Prt ¼ 0.85 model reveals
both models yield similar and accurate results far away from the strengths and weaknesses that should be investigated further. This
hole, at X/D > 10. consistency in results is reassuring: had the application of the
Fig. 7 Line plots of h in the FPG geometry. (a) shows wall-normal variation at the centerline
of the X/D 5 2 position; (b) shows streamwise variation at the centerline of Y/D 5 0.15 position
(near-wall centerline adiabatic effectiveness). The top-right corner of plot (b) shows a
zoomed-in version of itself close to injection.

Table 3 Error in the FPG case using distinct diffusivity fields Table 4 Features used by the model, sorted in descending
order of feature importance. The last column indicates the
Region of interest (FPG) standard deviation of the importance across all trees in the
forest.
Total Injection Wall
ID Feature Importance r
X/D: –1 to 19 X/D: –1 to 4 X/D: 1 to 19 pffiffiffi
Y/D: 0.15 to 2 Y/D: 0.15 to 2 Y/D: 0.15 to 0.25 17 kd 0.2996 0.0133
Z/D: –1 to 1 Z/D: –1 to 1 Z/D: –1 to 1 Red ¼

18 t/ 0.1537 0.0136
RANS 0.0225 0.0326 0.0392 5 tr(R2SRS2) 0.0891 0.0028
ML 0.0202 0.0286 0.0453 1 tr(S3) 0.0651 0.0047
0 tr(S2) 0.0606 0.0031
2 tr(R2) 0.0579 0.0032
present approach produced radically different effects in these two 8 qT S2q 0.0559 0.0082
similar datasets, one might suspect that the ML algorithm is sensi- 6 qTq 0.0463 0.0033
tive to small differences in the simulation, which is an undesirable 3 tr(R2S) 0.0440 0.0045
property. The present results, however, hint that this novel and 9 qT R2q 0.0298 0.0082
somewhat unorthodox machine learning approach has the poten- 4 tr(R2S2) 0.0222 0.0021
tial to generate robust turbulence models. 7 qT Sq 0.0187 0.0020
13 qT R2Sq 0.0162 0.0028
12 qT SRS2q 0.0082 0.0012
4 Feature Analysis 11 qT RS2q 0.0076 0.0008
The objective of this section is to analyze how important each 15 qT R2SRq 0.0073 0.0009
of the different flow features /i is for the RF in its predictions of 14 qT R2S2q 0.0071 0.0009
at. This analysis is useful because it provides physical insight into 16 qT RS2R2q 0.0058 0.0008
the machine learning model that could be used to devise improved 10 qT RSq 0.0052 0.0008
analytic models.
4.1 Background on Random Forests. Louppe [27] provides standard deviation (r) in the importance between all trees in the
a comprehensive explanation of how random forests and decision forest is also calculated for each feature, which gives a metric of
trees work. The following paragraphs give a simplified overview how different the trees in the forest are. Note that the feature
to motivate feature importance and describe how it can be importance metric is commonly used in random forest models and
calculated. is more thoroughly described in Ref. [27].
A random forest is an ensemble algorithm because it consists of
a set of different instances of a same, simpler building block 4.2 Feature Importance. Table 4 shows each of the 19 fea-
model, the binary decision tree. Each tree is distinct because it is tures of the present model, ranked by the importance calculated
fitted to a randomly selected subset of the full training set. Each of according to Sec. 4.1. Each feature is a scalar, and they form an
these random subsets is drawn with replacement from the full invariant basis of the relevant tensorial quantities (see Ref. [18]).
training set, so that some points are repeated and some are S and R denote the symmetric and antisymmetric parts of the
omitted. mean velocity gradient tensor ru, which are 3 3 tensors, and q
To construct the tree (i.e., at training time), the splits are deter-
denotes the mean concentration gradient (i.e., q ¼ rh), a three-
mined through a greedy algorithm. At a given node n, each choice dimensional vector. Note again that this RF was trained in the
of a split can be represented by a particular feature /i and a Skewed and Cube datasets and the feature importance is solely
threshold, T. Each choice of {/i, T}, would lead to some data
based on those training points. Figure 8 presents the same infor-
points falling into the left child of n, and the remaining falling mation in graphical form.
into the right child. The algorithm picks the particular /i and T
The first observation is that when the model is trained, the
that minimize the sum of the variances in the target quantity at,LES Reynolds number based on distance to the wall Red is by far
of the points in the two child nodes, weighted by the number of the most important feature; alone, it accounts for about 30% of
elements in each child node. Intuitively, the algorithm wants the
ðnÞ
most informative split possible: by choosing f/i ; T ðnÞ g at node
n, it separates the data into two sides, with each side containing
points that are similar to each other. At test time, each data point
starts at the root and goes down the tree according to the rules
imposed by each node, until a leaf is reached and the turbulent dif-
fusivity at,ML is assigned.
As the tree is trained, the quality of each split is calculated.
This is just the reduction in variance between the set of points
assigned to node n and the two complementary subsets of points
that go to the left and right children of n. Then, for each tree, the
feature importance is calculated as the sum of the qualities of all
the splits where that feature is used, weighted by the number of
training points that pass through each split. This quantity is nor-
malized, so the importance metrics of all the features sum to one.
Higher importance for a feature means that the feature is more
often used to make informative splits. Intuitively, this importance
metric shows how much each feature was able to explain the var-
iance in the turbulent diffusivity in the training set. Fig. 8 Feature importance for each model feature in the train-
For a random forest, the importance metric in each of its deci- ing set. Error bars indicate the standard deviation between the
sion trees is individually calculated and then averaged. The 1000 trees in the forest.

Fig. 9 Contour plots showing the pointwise feature usage of selected features in the Baseline case. These are streamwise
planes at X/D 5 2 and the contour lines show levels of h 5 0:75, 0.5, 0.25 in the LES field. The figure is blanked in cells where
the gradient of the mean scalar field is negligible. Color plots are available in the online version.
the variance of the diffusivity. The eddy viscosity ratio t/ is the six different features (note that they are normalized such that the
second most important feature, contributing about 15%. The importance of all 19 features sum to 1 at each location). The fea-
Reynolds analogy (i.e., the assumption that scalar transport is tures presented are among the ones that had highest aggregate
analogous to momentum transport, which leads to a fixed Prt) training set importance as shown in Table 4, so they are expected
implies that at is only a function of t, so it is expected that t/ to contribute the most to the RF decisions. Figure 9 shows the
would be an important feature. The Reynolds analogy breaks same streamwise planes from before, at location X/D ¼ 2, and
down at the wall, where scalar and momentum transport have overlays the importance plots with contour lines of the mean sca-
asymptotically different behaviors, as noted by Ling [28]. Thus, lar concentration from the LES of Bodart et al. [22].
the RF could learn this by including a strong dependence on the As expected from the results of Sec. 4.2, the Red is the feature
distance of a cell to the wall. most often used in splits, as shown in Fig. 9(a). The PFU attains
Despite the high importance attained by these two features, the the highest values closest to the bottom wall: in the first few com-
17 other features derived from ru and rh are responsible for putational cells, within the viscous sublayer, it surpasses 0.4 (i.e.,
about 55% of the overall combined importance. This shows that the feature is used in more than 40% of all splits). This is expected
the RF, in an attempt to explain the extracted turbulent diffusivity because the turbulent scalar transport has a strong dependence on
at,LES, needs to use much more information than just the local the distance from the wall in regions where viscous effects are
eddy viscosity and distance to the wall. Interestingly, the RF dominant. Usage of this feature is also high below the jet core and
seems to use the features derived purely from the velocity gradient in the shear layer on the windward side of the jet. On the other
much more than the features that also involve the concentration hand, the eddy viscosity shown in Fig. 9(b) is not very useful on
gradient (a combined importance of 34% and 21%, respectively). the bottom of the jet (importance around 0.1), but it becomes
This hints that it is building a roughly linear scalar closure in the important closer to the free-stream, particularly in the shear layer
training set, even though it has the capability of building a nonlin- in the windward side of the jet. In this region, the gradients in the
ear one. Since the present model prescribes at in Eq. (2) as a weak mean scalar and streamwise velocity fields behave similarly,
function of rh, then Eq. (1) remains almost linear. which would lead one to believe that the Reynolds analogy is a
better assumption (compared to other regions of this flow). These
results seem to support that conclusion.
4.3 Pointwise Feature Usage. Section 4.2 deals only with The contour plots of two features that only depend on the veloc-
the feature importance, an aggregate metric computed over all ity gradient are shown in Figs. 9(c) and 9(d) (features 5 and 1,
points in the training set. Another potentially interesting question respectively). Interestingly, feature 5 (a sixth-order term in the
concerns how the model uses the features at test time. For this, a velocity gradient) is highly important directly under the jet core
new metric is proposed, the pointwise feature usage (PFU). After (as seen by a PFU of around 0.2 near Z/D ¼ 0 and Y/D ¼ 0.1).
the RF is constructed, the forest generates at,ML predictions in This coincides with a critical location in the flow where the
every point in the test set by tracing a path down the branches of Prt ¼ 0.85 model overestimates vertical scalar transport. The pres-
the tree for each tree. Depending on the path that the algorithm ent work, then, suggests that this sixth-order term might be related
follows from root to leaf for a particular test point, the model con- to this phenomenon. The contour plot for feature 1 (the third-
siders different features to split on. Counting how many times order term based on the symmetric part of the velocity gradient)
each feature is employed for a split in that path, one can generate might seem uninteresting, but it shows a consistent and very thin
a rough idea of which features were important to make a decision hot spot in the cells right next to the wall: in the first few cells,
in that computational cell. This count is then normalized (so that, located in the viscous sublayer, this term has the second highest
at each example in the test set it sums to one), and the resulting PFU (after Red), with a value of around 0.2.
quantity is the pointwise feature usage metric. Figures 9(e) and 9(f) show contour plots of the two most impor-
This procedure is applied to the Baseline dataset and Fig. 9 tant features that include information about the mean scalar gradi-
shows the results. It contains contour plots showing the PFU of ent (features 8 and 6, respectively). The former is used moderately

in most of the plane, except in the proximity of the wall insights that might be useful to improve machine learning models
(Y/D < 0.2), where it is practically irrelevant. Feature 6 (which is and propose new analytical ones.
just the squared norm of the mean scalar gradient) is also unim- The past few years have seen a sharp rise in interest in data-
portant very close to the wall (where, due to the adiabatic bound- driven turbulence modeling. For these machine learning methods
ary condition, the scalar gradient is zero), but it shows moderate to reach their full impact, they should not be black boxes: model
importance toward the top of the jet core, and also in symmetric interpretability is crucial for unlocking scientists’ ability to learn
hot spots at around Y/D ¼ 0.1 and Z/D ¼ 60.6. The latter is a from data-driven models. This study presents techniques to gain
region with relatively high in-plane gradient (as seen by the prox- spatially resolved modeling insights from data-driven turbulence
imity of the contour levels), where the bottom of the counter- models. Future work will dive deeper into using these modeling
rotating vortex pair is located. insights to design more accurate analytic models.
From a modeling perspective, these results could be useful to
inspire different approaches for scalar transport closure. For Acknowledgment
instance, the very high importance of distance to the wall in pre-
dicting the diffusivity suggests that one might want to build that Pedro M. Milani was supported by grants from Sandia National
into a model: instead of prescribing at, it could be better to pre- Laboratories (under agreement number 1757831) and by Honey-
scribe at =Rebd for some power b, at least near the wall. The third- well Aerospace.
order term tr(S3) could also be used in a model that attempts to
predict the correct transport very near to the wall, due to its high Nomenclature
importance there. It seems that a linear closure could be sufficient BR ¼ blowing ratio in a jet in crossflow configuration
in most regions of the flow, since the features involving rh are d¼ distance to the nearest wall
not used in most of the geometry; however, nonlinearities could D¼ length scale (hole diameter/cube height)
also hold the key to obtaining right behavior in the regions where DR ¼ density ratio in a jet in crossflow configuration
features that involve the scalar gradient have higher pointwise fea- GDH ¼ gradient diffusion hypothesis
ture usage. k¼ turbulent kinetic energy
L¼ length of film cooling hole
ML ¼ machine learning
MRC ¼ magnetic resonance concentration
5 Conclusion n¼ node in a decision tree
Ntrees ¼ number of trees in the random forest
The present paper revisited the machine learning approach for
PFU ¼ pointwise feature usage
turbulent scalar modeling presented by Milani et al. [12]. It was
Prt ¼ turbulent Prandtl number t/at
calibrated on two datasets where high-fidelity simulations are
q¼ gradient in the mean scalar gradient, q ¼ rh
available (cube and skewed) and then validated on two other data-
R¼ antisymmetric part of velocity gradient,
sets (baseline and FPG). The scalar fields presented were obtained
R ¼ 0:5 ðru ru T Þ
in a more rigorous manner, which explains the difference between
RF ¼ random forest
some results of the present work and the original presentation of
S¼ symmetric part of velocity gradient,
the model. 3D comparisons between fixed Prt models and the pres-
S ¼ 0:5 ðru þ ru T Þ
ent machine learning model in the baseline and FPG cases show
T¼ feature threshold at a decision tree node
some improvement (particularly in the critical near-injection
tr() ¼ Trace of a tensor
region of film cooling flows) but also some lingering deficiencies.
Vi ¼ Volume of the ith computational cell
Another interesting result is that under the current framework, the
at ¼ turbulent diffusivity (m2/s)
learning process manages to obtain a field not too dissimilar from
at,LES ¼ diffusivity extracted from the high-fidelity simulation
its theoretical best—that solved with the diffusivity extracted
at,ML ¼ diffusivity predicted by the machine learning algorithm
from the LES. This shows that the dominating error source in the
at,RANS ¼ diffusivity predicted by the fixed Prt ¼ 0.85 model
machine learning model is not the machine learning step, but the
d¼ boundary layer thickness
model form (GDH with scalar diffusivity).
e¼ turbulent dissipation rate
Future work aimed at producing more accurate models should
h¼ scalar concentration, 0 h 1
attempt to move away from the simple GDH, using other closures
t ¼ eddy viscosity calculated by RANS (m2/s)
that ideally would be amenable to machine learning techniques.
r¼ standard deviation
An anisotropic diffusivity could be an interesting possibility.
/i ¼ ith feature used by the machine learning model
Also, further validation of the present model against different film
cooling flows would test the robustness of the machine learning
procedure and guide the choice of new training datasets. Finally,
as discussed earlier, one limitation of the current approach is the References
use of the velocity field from typical RANS models when solving [1] Bogard, D. G., and Thole, K. A., 2006, “Gas Turbine Film Cooling,” J. Propul.
Power, 22(2), pp. 249–270.
for the scalar equation. Future work could overcome this by com- [2] Launder, B., and Spalding, D., 1974, “The Numerical Computation of Turbu-
bining the current approach with a machine learning technique to lent Flows,” Comput. Methods Appl. Mech. Eng., 3(2), pp. 269–289.
improve the modeling in the momentum equation (like the ones [3] Wilcox, D. C., 1998, Turbulence Modeling for CFD, Vol. 2, DCW Industries,
presented by other authors, such as Ling [13], Weatheritt [14], La Canada, CA.
[4] Shih T- H., Zhu, J., and Lumley, L. J., 1995, “A new Reynolds Stress Algebraic
and Duraisamy [15]). Equation Model,” Comput. Methods Appl. Mech. Eng.,
In Sec. 4, the first attempt at extracting physical intuition from [5] Hoda, A., and Acharya, S., 1999, “Predictions of a Film Coolant Jet in Cross-
the machine learning model came in the form of analyzing how flow With Different Turbulence Models,” ASME J. Turbomach., 122(3), pp.
much different features are used by the model. It showed that the 558–569.
[6] Ling, J., Ryan, K. J., Bodart, J., and Eaton, J. K., 2016, “Analysis of Turbulent
random forest depends heavily on the distance to the wall to pre- Scalar Flux Models for a Discrete Hole Film Cooling Flow,” ASME J. Turbom-
dict at, with the eddy viscosity also being relatively important. ach., 138(1), p. 011006.
Overall, the features that depend on the scalar field are not partic- [7] Kays, W. M., 1994, “Turbulent Prandtl Number—Where Are We?,” ASME J.
ularly important, hinting that the RF infers a model that almost Heat Transfer, 116(2), pp. 284–295.
[8] Schreivogel, P., Abram, C., Fond, B., Straußwald, M., Beyrau, F., and Pfitzner,
maintains the linearity of the scalar transport equation. Point-by- M., 2016, “Simultaneous kHz-Rate Temperature and Velocity Field Measure-
point analysis of the features in the Baseline model, in the form of ments in the Flow Emanating From Angled and Trenched Film Cooling Holes,”
a proposed metric called pointwise feature usage, revealed Int. J. Heat Mass Transfer, 103, pp. 390–400.

[9] Kohli, A., and Bogard, D. G., 2005, “Turbulent Transport in Film Cooling [19] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
Flows,” ASME J. Heat Transfer, 127(5), pp. 513–520. Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos,
[10] Daly, B. J., and Harlow, F. H., 1970, “Transport Equations in Turbulence,” A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E., 2011, “Scikit-
Phys. Fluids, 13(11), pp. 2634–2649. Learn: Machine Learning in Python,” J. Mach. Learn., 12, pp. 2825–2830.
[11] Abe, K., and Suga, K., 2001, “Towards the Development of a Reynolds- [20] Ling, J., and Templeton, J., 2015, “Evaluation of Machine Learning Algorithms
Averaged Algebraic Turbulent Scalar-Flux Model,” Int. J. Heat Fluid Flow, for Prediction of Regions of High Reynolds-averaged Navier–Stokes
22(1), pp. 19–29. Uncertainty,” Phys. Fluids, 27(8), p. 085103.
[12] Milani, P. M., Ling, J., Saez-Mischlich, G., Bodart, J., and Eaton, J. K., 2018, [21] Coletti, F., Benson, M., Ling, J., Elkins, C., and Eaton, J., 2013, “Turbulent Trans-
“A Machine Learning Approach for Determining the Turbulent Diffusivity in port in an Inclined Jet in Crossflow,” Int. J. Heat Fluid Flow, 43, pp. 149–160.
Film Cooling Flows,” ASME J. Turbomach., 140(2), p. 021006. [22] Bodart, J., Coletti, F., Bermejo-Moreno, I., and Eaton, J., 2013, “High-Fidelity
[13] Ling, J., Kurzawski, A., and Templeton, J., 2016, “Reynolds Averaged Turbu- Simulation of a Turbulent Inclined Jet in a Crossflow,” Cent. Turbul. Res.
lence Modelling Using Deep Neural Networks With Embedded Invariance,” Annu. Res. Briefs, (epub).
J. Fluid Mech., 807, pp. 155–166. [23] Coletti, F., Elkins, C. J., and Eaton, J. K., 2013, “An Inclined Jet in Crossflow
[14] Weatheritt, J., Pichler, R., Sandberg, R. D., Laskowski, G., and Michelassi, V., Under the Effect of Streamwise Pressure Gradients,” Exp. Fluids, 54(9), p. 1589.
2017, “Machine Learning for Turbulence Model Development Using a High- [24] Folkersma, M., and Bodart, J., 2018, “Large Eddy Simulation of an Asymmetric
Fidelity HPT Cascade Simulation,” ASME Paper No. GT2017-63497. Jet in Crossflow,” Direct and Large-Eddy Simulation X, Springer International
[15] Duraisamy, K., Zhang, Z. J., and Singh, A. P., 2015, “New Approaches in Tur- Publishing, Basel, Switzerland, pp. 85–91.
bulence and Transition Modeling Using Data-Driven Techniques,” AIAA Paper [25] Rossi, R., Philips, D., and Iaccarino, G., 2010, “A Numerical Study of Scalar
No. AIAA 2015-1284. Dispersion Downstream of a Wall-Mounted Cube Using Direct Simulations and
[16] Ribeiro, M. T., Singh, S., and Guestrin, C., 2016, “Why Should I Trust Algebraic Flux Models,” Int. J. Heat Fluid Flow, 31(5), pp. 805–819.
You?: Explaining the Predictions of Any Classifier,” 22nd ACM SIGKDD [26] Oliver, T. A., Anderson, J. B., Bogard, D. G., Moser, R. D., and Laskowski, G.,
International Conference on Knowledge Discovery and Data Mining, pp. 2017, “Implicit LES for Shaped-Hole Film Cooling Flow,” ASME Paper No.
1135–1144. GT2017-63314.
[17] Wu, J., Wang, J., Xiao, H., and Ling, J., 2017, “Visualization of High Dimen- [27] Louppe, G., 2014, “Understanding Random Forests: From Theory to Practice,”
sional Turbulence Simulation Data Using t-SNE,” AIAA Paper No. AIAA e-print: arXiv:1407.7502.
2017-1770. [28] Ling, J., 2014, “Improvements in Turbulent Scalar Mixing Modeling for
[18] Ling, J., Jones, R., and Templeton, J., 2016, “Machine Learning Strategies for Trailing Edge Slot Film Cooling Geometries: A Combined Experimental and
Systems With Invariance Properties,” J. Comput. Phys., 318, pp. 22–35. Computational Approach,” Ph.D. thesis, Stanford University, Stanford, CA.

Flax New

Uploaded by

Copyright:

Available Formats

You might also like

Flax New

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Flax New

Uploaded by

Copyright:

Available Formats

Physical Interpretation of

Journal of Turbomachinery Copyright V

011004-2 / Vol. 141, JANUARY 2019 Transactions of the ASME

Table 1 Summary of the datasets

Case Description ReD Use

Baseline Inclined jet in crossflow 3000 Test

Journal of Turbomachinery JANUARY 2019, Vol. 141 / 011004-3

011004-4 / Vol. 141, JANUARY 2019 Transactions of the ASME

Table 2 Error in the Baseline case using distinct diffusivity

Journal of Turbomachinery JANUARY 2019, Vol. 141 / 011004-5

011004-6 / Vol. 141, JANUARY 2019 Transactions of the ASME

Journal of Turbomachinery JANUARY 2019, Vol. 141 / 011004-7

011004-8 / Vol. 141, JANUARY 2019 Transactions of the ASME

Journal of Turbomachinery JANUARY 2019, Vol. 141 / 011004-9

011004-10 / Vol. 141, JANUARY 2019 Transactions of the ASME

You might also like