Professional Documents
Culture Documents
Apigee Edge Antipatterns 2 0 PDF
Apigee Edge Antipatterns 2 0 PDF
Apigee Edge Antipatterns 2 0 PDF
1
Contents
Introduction to Antipatterns 3
What is this book about? 4
Why did we write it? 5
Antipattern Context 5
Target Audience 5
Authors 6
Acknowledgements 6
Edge Antipatterns
1. Policy Antipatterns 8
1.2. Set Long Expiration time for OAuth Access and Refresh Token 13
1.7. Invoke the MessageLogging policy multiple times in an API proxy 29
1.8. Configure a Non Distributed Quota 36
1.9. Re-use a Quota policy 38
2
3.3. Manage Edge Resources without using Source Control Management 69
3.4. Define multiple virtual hosts with same host alias and port number 73
3.5. Load Balance with a single Target Server with MaxFailures set to a non-zero value
78
3.6. Access the Request/Response payload when streaming is enabled 83
5.1. Add Custom Information to Apigee owned Schema in Postgres Database 93
3
Introduction to Antipatterns
Everything in the universe, every phenomenon has two polar opposite expressions. The
physicist Robert Dirac’s equation of the electron concludes that for every electron, there is
an Antielectron. Matter therefore has its twin - the Antimatter. Unsurprisingly, Software
Design ‘Patterns’ have ‘ Antipatterns’.
Andrew Koenig coined the word antipattern as early as 1955. Literature says that he was
inspired by the Gang of Fours book, D esign Patterns, which developed the titular concept in
the software field.
Wikipedia defines a software antipattern as:
“In software engineering, an anti-pattern is a pattern that may be commonly used but is
ineffective and/or counterproductive in practice.”
Simply put, an antipattern is something that the software allows its ‘user’ to do, but is
something that may have adverse functional, serviceable or performance affect.
Let’s take an example:
Consider the exotically named - “The God Class/Object”
In Object Oriented parlance, A God Class is a class that controls too many classes for a
given application.
4
So, if an application has a class that has the following reference tree:
Each oval blob above represents a class. As the image illustrates, the class depicted as
‘God Class’ uses and references too many classes.
The framework on which the application was developed does not prevent the creation of
such a class, but it has many disadvantages, the primary one’s being :
1) Hard to maintain
2) Single point of failure when the application runs
Consequently, creation of such a class should be avoided. It is an antipattern.
5
This book is specifically about the don'ts on Apigee Edge - the Antipatterns.
Antipattern Context
It is important to note that most antipatterns of evolving platforms are a point in time
phenomenon. Therefore any antipatterns documented in this book may cease to remain
as such with changes in the design of the platform. We will be constantly updating the
book with any such changes.
It is important therefore that the latest version of the book be referred to.
Target Audience
This book would best serve 'Apigee Edge developers' as they progress through the
lifecycle of designing and developing API proxies for their services. It should ideally be
used as a reference guide during the API Development Lifecycle.
6
The other context that would be served well by the book is for troubleshooting. It would
definitely be worth it for teams to quickly refer to this book, if any of the policies are not
working as per documentation.
For eg. - If a team is encountering a problem with caching, they could scan the book for
any antipatterns related to caching to understand if they have implemented the same and
also get an idea on how to resolve the problem.
The book presupposes that the Target Audience comprises of people who have used
Apigee Edge and are therefore aware of standard terminologies like proxies, policies,
management services, gateway services etc. It does not venture to define or explain
Apigee Edge concepts or common terminologies.
Those details can be found at : https://docs.apigee.com
Authors
Amar Devegowda
Akash Tumkur Prabhashankar
Debora Elkin
Mark Eccles
Muthukkannan Alagappan
Senthil Kumar Tamizhselvan
Uday Joshi
Venkataraghavan Lakshminarayanachar
Acknowledgements
We would like to acknowledge the significant contributions made by the following
Googlers in reviewing, proofreading and shaping this book - Arghya Das, Alex Koo, Bala
Kasiviswanathan, Carlos Eberhardt, Dane Knezic, Dave Newman, David Allen, Divya Achan,
Gregory Brail, Greg Kuelgen, Floyd Jones, Janice Hunt, Jennifer Bennett, Joshua Norrid, Liz
7
Lynch, Matthew Horn, Marsh Gardiner, Mike Dunker, Mukunda Gnanasekharan, Peter
Johnson, Prashanth Subrahmanyam, Rajesh Doda, Stephen Gilson, Steven Richardson,
Steve Traut, Will Witman and many others.
8
Edge Antipatterns
1.Policy Antipatterns
HTTP Client
A powerful feature of the Javascript policy is the HTTP client. The HTTP client (or the
httpClientobject) can be used to make one or multiple calls to backend or external
services. The HTTP client is particularly useful when there’s a need to make calls to
multiple external services and mash up the responses in a single API.
Sample JavaScript Code making a call to backend with httpClient object
varheaders = { 'X-SOME-HEADER': ' some value'};
varmyRequest =
newR
equest
("http://www.example.com"
,
"GET"
,headers);
varexchange = httpClient.
send
(myRequest);
The httpClientobject exposes two methods getand s end(s endis used in the above
sample code) to make HTTP requests. Both methods are asynchronous and return an
exchangeobject before the actual HTTP request is completed.
9
The HTTP requests may take a few seconds to a few minutes. After an HTTP request is
made, it’s important to know when it is completed, so that the response from the request
can be processed. One of the most common ways to determine when the HTTP request is
complete is by invoking the e
xchangeobject’s waitForComplete()method.
waitForComplete()
The waitForComplete()method pauses the thread until the HTTP request completes
and a response (success/failure) is returned. Then, the response from a backend or
external service can be processed.
Sample JavaScript code with waitForComplete()
varheaders = {
'X-SOME-HEADER': '
some value'};
varmyRequest =
newR
equest
("http://www.example.com"
,
"GET"
,headers);
varexchange = httpClient.
send
(myRequest);
// Wait for the asynchronous GET request to finish
exchange.
waitForComplete
();
// Get and Process the response
if(exchange.isSuccess()) {
varresponseObj = exchange.getResponse().content.asJSON;
returnresponseObj.access_token;
} e
lse
if(exchange.isError()) {
throw
new
Error
(exchange.getError());
}
Antipattern
Using w
aitForComplete()after sending an HTTP request in JavaScript code will have
performance implications.
10
11
Impact
1. The API requests will fail with 500 Internal Server Error and with error message
'Timed out' when the number of concurrent requests executing
waitForComplete()in the JavaScript code exceeds the predefined limit.
2. Diagnosing the cause of the issue can be tricky as the JavaScript fails with ' Timed
out' error even though the time limit for the specific JavaScript policy has not
elapsed.
Best Practice
Use callbacks in the HTTP client to streamline the callout code and improve performance
and avoid using w aitForComplete()in JavaScript code. This method ensures that the
thread executing JavaScript is not blocked until the HTTP request is completed.
When a callback is used, the thread sends the HTTP requests in the JavaScript code and
returns back to the pool. Because the thread is no longer blocked, it is available to handle
other requests. After the HTTP request is completed and the callback is ready to be
executed, a task will be created and added to the task queue. One of the threads from the
pool will execute the callback based on the priority of the task.
Sample JavaScript code using Callbacks in httpClient
function onComplete(response,error)
// Check if the HTTP request was successful
if ( response) {
context.setVariable('example.status', r esponse.status);
} else {
context.setVariable('example.error', 'Woops: ' +
error);
}
}
// Specify the callback Function as an argument
httpClient.get("http://example.com", onComplete);
12
Further reading
● JavaScript policy
● JavaScript callback support in httpClient for improved callouts
● Making JavaScript callouts with httpClient
13
1.2. Set Long Expiration time for OAuth Access and Refresh Token
Apigee Edge provides the O Auth 2.0 framework to secure APIs. OAuth2 is one of the most
popular open-standard, token-based authentication and authorization schemes. It enables
client applications to access APIs on behalf of users without requiring users to divulge
their username and password.
Apigee Edge allows developers to generate access and/or refresh tokens by implementing
any one of the four OAuth2 grant types - client credentials, password, implicit and
authorization code using the O
AuthV2 policy. Client applications use access tokens to
consume secure APIs. Each access token has its own expiry time, which can be set in the
OAuthV2 policy.
Refresh tokens are optionally issued along with access tokens with some of the grant
types. Refresh tokens are used to obtain new, valid access tokens after the original access
token has expired or been revoked. The expiry time for refresh tokens can also be set in
OAuthV2 policy.
Antipattern
Setting a long expiration time for an Access Token and/or Refresh Token in the OAuthV2
policy leads to accumulation of OAuth tokens and increased disk space use on Cassandra
nodes.
Sample OAuthV2 policy showing a long expiration time of 200 days for Refresh
Tokens
<OAuthV2n
ame
="GenerateAccessToken"
>
<Operation>
GenerateAccessToken
</Operation>
<ExpiresIn>
1800000
</ExpiresIn>
<!-- 30 minutes -->
<RefreshTokenExpiresIn>
17280000000
</RefreshTokenExpiresIn><
!--
200 days -->
<SupportedGrantTypes>
<
GrantType>
password
</GrantType>
</SupportedGrantTypes>
14
<GenerateResponse
enabled
="true"
/>
</OAuthV2>
In the above example:
● The Access token is set with a reasonably lower expiration time of 30 mins.
● The Refresh Token is set with a very long expiration time of 200 days.
● If the traffic to this API is 10 requests/second, then it can generate as many as
864,000 tokens in a day.
● Since the refresh tokens expire only after 200 days, they persist in the data store
(Cassandra) for a long time leading to continuous accumulation.
Impact
● Leads to significant growth of disk space usage on the data store (Cassandra).
● For Private Cloud users, this could increase storage costs, or in the worst case, the
disk could become full and lead to runtime errors or outage.
Best Practice
Use an appropriate lower expiration time for OAuth access and refresh tokens depending
on your specific security requirements, so that they get purged quickly and thereby avoid
accumulation.
Set the expiration time for refresh tokens in such a way that it is valid for a little longer
period than the access tokens. For example, if you set 30 mins for access token and then
set 60 mins for refresh token.
This ensures that:
● There’s ample time to use a refresh token to generate new access and refresh
tokens after the access token is expired.
● The refresh tokens will expire a little while later and can get purged in a timely
manner to avoid accumulation.
15
Further reading
● OAuth 2.0 framework
● Secure APIs with OAuth
● Edge Product Limits
16
The regular expressions can be defined for request paths, query parameters, form
parameters, headers, XML elements (in an XML payload defined using XPath), JSON
object attributes (in a JSON payload defined using JSONPath).
<!-- /antipatterns/examples/greedy-1.xml -->
<RegularExpressionProtection
async
=
"false"
continueOnError
=
"false"
enabled
="true"
n
ame
=
"RegexProtection"
>
<DisplayName>
RegexProtection
</DisplayName>
<Properties/>
<Source>
request
</Source>
<IgnoreUnresolvedVariables>
false
</IgnoreUnresolvedVariables>
<QueryParam
name
=
"query"
>
<
Pattern>
[\s]*(?i)((delete)|(exec)|(drop\s*table)|
(insert)|(shutdown)|(update)|(\bor\b))
</Pattern>
</QueryParam>
</RegularExpressionProtection>
Antipattern
The default quantifiers (*
, +
,?
) are greedy in nature - they start to match with the longest
possible sequence; when no match is found they backtrack gradually to try to match the
pattern. If the resultant string matching the pattern is very short, then using greedy
17
quantifiers can take more time than necessary. This is especially true if the payload is
large (in the tens or hundreds of KBs).
The following example expression uses multiple instances of . *
, which are greedy
operators:
<Pattern>.*Exception in thread.*</Pattern>
In this example, the RegularExpressionProtection policy first tries to match the longest
possible sequence -- the entire string. If no match is found, the policy then backtracks
gradually. If the matching string is close to the start or middle of the payload, then using a
greedy quantifier like . *can take a lot more time and processing power than reluctant
qualifiers like .
*?or (less commonly) possessive quantifiers like .*+ .
Reluctant quantifiers (like X*? ,X
+?
, X??
) start by trying to match a single character from
the beginning of the payload and gradually add characters. Possessive quantifiers (like
X?+ , X*+
,X ++) try to match the entire payload only once.
Given the following sample text for the above pattern:
Hello this is a sample text with Exception in thread with lot of
text after the Exception text.
Using greedy .*is non-performant in this case. The pattern .*Exception in
thread.*takes 141 steps to match. If you used the pattern .*?Exception in
thread.*(with a reluctant quantifier) instead, the evaluation takes only 55 steps.
Impact
Using greedy quantifiers like wildcards (*
) with the R
egularExpressionProtection policy
can lead to:
● Increase in overall latency for API requests for a moderate payload size (up to
1MB)
● Longer time to complete the execution of the RegularExpressionProtection policy
● API requests with large payloads (>1MB) failing with 504 Gateway Timeout Errors
if the predefined timeout period elapses on the Edge Router
● High CPU utilization on Message Processors due to large amount of processing
which can further impact other API requests
18
Best Practice
● Avoid using greedy quantifiers like .*in regular expressions with the
ReqularExpressionProtection policy. Instead, use reluctant quantifiers like .*?or
possessive quantifiers like .
*+(less commonly) wherever possible.
Further reading
● Regular Expression Quantifiers
● Regular Expression Protection policy
19
Antipattern
The Response Cache policy allows to cache HTTP responses with any possible Status
code, by default. This means that both success and error responses can be cached.
Here’s a sample Response Cache policy with default configuration:
20
<ResponseCache
async
=
"false"c
ontinueOnError
="false"e
nabled
="true"
name
="TargetServerResponseCache"
>
<
DisplayName>
TargetServer ResponseCache
</DisplayName>
<
CacheKey>
<KeyF
ragment
ref
="request.uri"/
></CacheKey>
<Scope>
Exclusive
</Scope>
<ExpirySettings>
<
TimeoutInSecr
ef
="flow.variable.here"
>600
</TimeoutInSec>
</ExpirySettings>
<
CacheResource>
targetCache
</CacheResource>
</ResponseCache>
The Response Cache policy caches error responses in its default configuration. However,
it is not advisable to cache error responses without considerable thought on the adverse
implications because:
● Scenario 1: Failures occur for a temporary, unknown period and we may continue
to send error responses due to caching even after the problem has been fixed
OR
● Scenario 2: Failures will be observed for a fixed period of time, then we will have to
modify the code to avoid caching responses once the problem is fixed
21
In all these cases, the failures could occur for a unknown time period and then we may
start getting successful responses. If we cache the error responses, then we may
continue to send error responses to the users even though the problem with the backend
server has been fixed.
Consider that we know the failure in the backend is for a fixed period of time. For instance,
you are aware that either:
Impact
● Caching error responses can cause error responses to be sent even after the
problem has been resolved in the backend server
● Users may spend a lot of effort troubleshooting the cause of an issue without
knowing that it is caused by caching the error responses from the backend server
Best Practice
1. Don’t store the error responses in the response cache. Ensure that the
<ExcludeErrorResponse>element is set to t ruein the R
esponseCache policy
to prevent error responses from being cached as shown in the below code snippet.
22
With this configuration only the responses for the default success codes 200 to
205 (unless the success codes are modified) will be cached.
<ResponseCache
async
=
"false"c
ontinueOnError
="false"
enabled
="true"
name
="TargetServerResponseCache"
>
<
DisplayName>
TargetServerResponseCache
</DisplayName>
<
CacheKey>
<KeyFragmentr
ef
="request.uri"
/>
<
/CacheKey>
<
Scope>
Exclusive
</Scope>
<
ExpirySettings>
<TimeoutinSec
ref
="flow.variable.here"
>600
</TimeoutinSec>
<
/ExpirySettings>
<
CacheResource>
targetCache
</CacheResource>
<
ExcludeErrorResponse>
true
</ExcludeErrorResponse>
</ResponseCache>
2. If you have the requirement to cache the error responses for some specific reason,
then you can do the following, only if you are absolutely sure that the backend
server failure is not for a brief/temporary period:
a. Determine the maximum/exact duration of time for which the failure will be
observed (if possible).
i. Set the Expiration time appropriately to ensure that you don’t cache
the error responses longer than the time for which the failure can be
seen.
ii. Use the ResponseCache policy to cache the error responses without
the <ExcludeErrorResponse> element.
b. Note: Once the failure has been fixed, remove/disable the ResponseCache
policy. Otherwise, the error responses for temporary backend server
failures may get cached.
23
Further reading
● Response Cache policy
24
Antipattern
This particular antipattern talks about the implications of exceeding the current cache size
restrictions within the Apigee Edge platform.
When data > 512kb is cached, the consequences are as follows :
● API requests executed for the first time on each of the Message Processors need
to get the data independently from the original source (policy or a target server), as
entries > 512kb are not available in L2 cache.
● Storing larger data (> 512kb) in L1 cache tends to put more stress on the platform
resources. It results in the L1 cache memory being filled up faster and hence
lesser space being available for other data. As a consequence, one will not be able
to cache the data as aggressively as one would like to.
● Cached entries from the Message Processors will be removed when the limit on
the number of entries is reached. This causes the data to be fetched from the
original source again on the respective Message Processors.
25
Impact
● Data of size > 512kb will not be stored in L2/persistent cache.
● More frequent calls to the original source (either a policy or a target server) leads
to increased latencies for the API requests.
Best Practice
1. It is preferred to store data of size < 512kb in cache to get optimum performance.
2. If there’s a need to store data > 512kb, then consider
a. Using any appropriate database for storing large data or
b. Compressing the data (if feasible) and store the data only if the
compressed size <= 512kb.
26
Further reading
● Edge Caching Internals
27
Antipattern
In the code below, the h
ttpClientobject in a J avaScript policy is used to log the data to
a Sumo Logic server. This means that the process of logging actually takes place during
request/response processing. This approach is counterproductive because it adds to the
processing time, thereby increasing overall latencies.
LogData_JS:
<Javascript
async
=
"false"c
ontinueOnError
="false"
enabled
="true"
timelimit
=
"2000"n
ame
="LogData_JS"
>
<
DisplayName>
LogData_JS
</DisplayName>
<
ResourceURL>
jsc://LogData.js
</ResourceURL>
</Javascript>
LogData.js:
varsumoLogicURL = "...";
httpClient.send(sumoLogicURL);
waitForComplete();
28
The culprit here is the call to waitForComplete() , which suspends the caller's
operations until the logging process is complete. A valid pattern is to make this
asynchronous by eliminating this call.
Impact
● Executing the logging code via the J
avaScript policy adds to the latency of the API
request.
● Concurrent requests can stress the resources on the Message Processor and
consequently adversely affecting the processing of other requests.
Best Practice
1. Use the Message Logging policy to transfer/log data to Log servers or third party
log management services such as Sumo Logic, Splunk, Loggly, etc.
2. The biggest advantage of the Message Logging policy is that it can be defined in
the Post Client Flow, which gets executed after the response is sent back to the
requesting client app.
3. Having this policy defined in Post Client flow helps separate the logging activity
and the request/response processing thereby avoiding latencies.
4. If there is a specific requirement or a reason to do logging via javascript (not the
Post Client flow option mentioned above), then it needs to be done asynchronously
i.e if you are using a httpclient object, you would need to ensure that the javascript
does -not- implement “waitForComplete”
Further reading
● JavaScript policy
● Message Logging policy
29
Antipattern
The MessageLogging policy provides an efficient way to get more information about an
API request and debugging any issues that are encountered with the API request.
However, using the same MessageLogging policy more than once or having multiple
MessageLogging policies log data in chunks in the same API proxy in flows other than the
PostClientFlow may have adverse implications. T his is because Apigee Edge opens a
connection to an external syslog server for a MessageLogging policy. If the policy uses
TLS over TCP, there is additional overhead of establishing a TLS connection.
Let’s explain this with the help of an example API proxy:
API proxy
In the following example, a MessageLogging policy named "LogRequestInfo" is placed in
the Request flow, and another MessageLogging policy named "LogResponseInfo" is added
to the Response flow. Both are in the ProxyEndpoint PreFlow. The LogRequestInfo policy
executes in the background as soon as the API proxy receives the request, and the
LogResponseInfo policy executes a fter the proxy has received a response from the target
server but before the proxy returns the response to the API client. This will consume
additional system resources as two TLS connections are potentially being established.
In addition, there's a MessageLogging policy named "LogErrorInfo" that gets executed only
if there's an error during API proxy execution.
<?xml version= "1.0"encoding= "UTF-8"standalone= "yes" ?>
<ProxyEndpoint
name
="default"
>
...
<FaultRules>
30
<FaultRule
name
="fault-logging"
>
<Step>
<Name>
LogErrorInfo
</Name>
</Step>
</FaultRule>
</FaultRules>
<PreFlown
ame
="PreFlow"
>
<Request>
<
Step>
<Name>
LogRequestInfo
</Name>
<
/Step>
</Request>
<
/PreFlow>
<
PreFlow
name
="PreFlow"
>
<Response>
<
Step>
<Name>
LogResponseInfo
</Name>
<
/Step>
</Response>
<
/PreFlow>
...
</ProxyEndpoint>
<?xml version=
"1.0"encoding=
"UTF-8"standalone=
"yes"
?>
31
<MessageLogging
name
=
"LogRequestInfo"
>
<
Syslog>
<Message>
[3f509b58
tag="{organization.name}.{apiproxy.name}.{environment.name}"] Weather
request for WOEID {request.queryparam.w}.
</Message>
<Host>
logs-01.loggly.com
</Host>
<Port>
6514
</Port>
<Protocol>
TCP
</Protocol>
<FormatMessage>
true
</FormatMessage>
<SSLInfo>
<Enabled>
true
</Enabled>
</SSLInfo>
<
/Syslog>
<
logLevel>
INFO
</logLevel>
</MessageLogging>
LogResponseInfo policy
<?xml version=
"1.0"encoding=
"UTF-8"standalone=
"yes"
?>
<MessageLogging
name
=
"LogResponseInfo"
>
<
Syslog>
<Message>
[3f509b58
tag="{organization.name}.{apiproxy.name}.{environment.name}"] Status:
{response.status.code}, Response {response.content}.
</Message>
<Host>
logs-01.loggly.com
</Host>
<Port>
6514
</Port>
<Protocol>
TCP
</Protocol>
<FormatMessage>
true
</FormatMessage>
<SSLInfo>
<Enabled>
true
</Enabled>
</SSLInfo>
<
/Syslog>
<
logLevel>
INFO
</logLevel>
32
</MessageLogging>
LogErrorInfo policy
<?xml version= "1.0"encoding=
"UTF-8"standalone=
"yes"
?>
<MessageLogging
name
=
"LogErrorInfo"
>
<
Syslog>
<Message>
[3f509b58
tag="{organization.name}.{apiproxy.name}.{environment.name}"] Fault
name: {fault.name}.
</Message>
<Host>
logs-01.loggly.com
</Host>
<Port>
6514
</Port>
<Protocol>
TCP
</Protocol>
<FormatMessage>
true
</FormatMessage>
<SSLInfo>
<Enabled>
true
</Enabled>
</SSLInfo>
<
/Syslog>
<
logLevel>
ERROR
</logLevel>
</MessageLogging>
Impact
● Increased networking overhead due to establishing connections to the log servers
multiple times during API proxy flow.
● If the syslog server is slow or cannot handle the high volume caused by multiple
syslog calls, then it will cause back pressure on the message processor, resulting
in slow request processing and potentially high latency or 504 Gateway Timeout
errors.
● Increased number of concurrent file descriptors opened by the message processor
on Private Cloud setups where file logging is used.
● If the MessageLogging policy is placed in flows other than PostClient flow, there’s
a possibility that the information may not be logged, as the MessageLogging policy
will not be executed if any failure occurs before the execution of this policy.
33
In the previous ProxyEndpoint example, the information will not be logged under
the following circumstances:
○ If any of the policies placed ahead of the LogRequestInfo policy in the
request flow fails.
or
○ If the target server fails with any error (HTTP 4XX, 5XX). In this situation,
when a successful response isn't returned, the LogResponseInfo policy
LogResponseInfo won’t get executed.
In both of these cases, the LogErrorInfo policy will get executed and logs only the
error-related information.
Best Practice
1. Use an ExtractVariables policy or JavaScript policy to set all the flow variables that
are to be logged, making them available for the MessageLogging policy.
2. Use a single MessageLogging policy to log all the required data in the
PostClientFlow, which gets executed unconditionally.
3. Use the UDP protocol where guaranteed delivery of messages to the syslog server
is not required and TLS/SSL is not mandatory.
The MessageLogging policy was designed to be decoupled from actual API functionality,
including error handling. Therefore, invoking it in the PostClientFlow, which is outside of
request/response processing, means that it will always log data regardless of whether the
API failed or not.
Here’s an example invoking MessageLogging policy in PostClientFlow
<?xml version=
"1.0"encoding=
"UTF-8"standalone=
"yes"
?>
...
<PostClientFlow>
<Request/>
<Response>
<Step>
<Name>
LogInfo
</Name>
</Step>
34
</Response>
</PostClientFlow>
...
Here’s an example of MessageLogging policy LogInfo that logs all the data
<?xml version=
"1.0"encoding=
"UTF-8"standalone=
"yes"
?>
<MessageLogging
name
=
"LogInfo"
>
<
Syslog>
<Message>
[3f509b58
tag="{organization.name}.{apiproxy.name}.{environment.name}"] Weather
request for WOEID {woeid} Status: {weather.response.code}, Response
{weather.response}, Fault: {fault.name:None}.
</Message>
<Host>
logs-01.loggly.com
</Host>
<Port>
6514
</Port>
<Protocol>
TCP
</Protocol>
<FormatMessage>
true
</FormatMessage>
<SSLInfo>
<Enabled>
true
</Enabled>
</SSLInfo>
<
/Syslog>
<
logLevel>
INFO
</logLevel>
</MessageLogging>
Because r esponse variables are not available in PostClientFlow following an Error Flow,
it's important to explicitly set woeidand weather.response*variables using
ExtractVariables or JavaScript policies.
Further reading
● JavaScript policy
● ExtractVariables policy
● Having code execute after proxy processing, including when faults occur, with the
PostClientFlow
35
Antipattern
An API proxy request can be served by one or more distributed Edge components called
Message Processors. If there are multiple Message Processors configured for serving API
requests, then the quota will likely be exceeded because each Message Processor keeps
it’s own ‘count’ of the requests it processes.
Let’s explain this with the help of an example. Consider the following Quota policy for an
API proxy -
<!-- /antipatterns/examples/1-6.xml -->
<Quota
name
="CheckTrafficQuota"
>
<
Interval>
1</Interval>
<
TimeUnit>
hour
</TimeUnit>
<
Allowc
ount
="100"
/>
</Quota>
The above configuration should allow a total of 100 requests per hour.
36
However, in practice when multiple message processors are serving the API requests, the
following happens
In the above illustration,
● The quota policy is configured to allow 100 requests per hour.
● The requests to the API proxy are being served by two Message Processors.
● Each Message Processor maintains its own quota count variable,
quota_count_mp1 and q uota_count_mp2, to track the number of requests they are
processing.
● Therefore each of the Message Processor will allow 100 API requests separately.
The net effect is that a total of 200 requests are processed instead of 100
requests.
Impact
This situation defeats the purpose of the quota configuration and can have detrimental
effects on the backend servers that are serving the requests.
The backend servers can:
● be stressed due to higher than expected incoming traffic
● become unresponsive to newer API requests leading to 503 errors
37
Best Practice
Consider, setting the <
Distributed>element to truein the Q uota policy to ensure that
a common counter is used to track the API requests across all Message Processors. The
<Distributed>element can be set as shown in the code snippet below:
<!-- /antipatterns/examples/1-7.xml -->
<Quota
name
="CheckTrafficQuota"
>
<
Interval>
1</Interval>
<
TimeUnit>
hour
</TimeUnit>
<
Distributed>
true
</Distributed>
<
Allowc
ount
="100"
/>
</Quota>
Further reading
● Quota policy
38
Antipattern
If a Quota policy is reused, then the quota counter will be decremented each time the
Quota policy gets executed irrespective of where it is used. That is, if a Quota policy is
reused:
● Within the same flow or different flows of an API proxy
● In different target endpoints of an API proxy
Then the quota counter is decremented each time it is executed and we will end up getting
Quota violation errors much earlier than expected for the specified interval of time.
Let’s use the following example to explain how this works.
API proxy
Let’s say we have an API proxy named “ TestTargetServerQuota”, w hich routes traffic to
two different target servers based on the resource path. And we would like to restrict the
API traffic to 10 requests per minute for each of these target servers. Here’s the table that
depicts this scenario :
Resource Path Target Server Quota
Quota policy
Since the traffic quota is same for both the target servers, we define single quota policy
named “ Quota-Minute-Target-Server” a s shown below:
39
<Quota
name
="Quota-Minute-Target-Server"
>
<
Interval>
1</Interval>
<
TimeUnit>
minute
</TimeUnit>
<
Distributed>
true
</Distributed>
<
Allowc
ount
="10"
/>
</Quota>
Target Endpoints
Let’s use the quota policy “Quota-Minute-Target-Server” i n the preflow of the target
endpoint “Target-US”
<TargetEndpoint name ="Target-US" >
<
PreFlow
name
="PreFlow"
>
<Request>
<
Step>
<Name>
Quota-Minute-Target-Server
</Name>
<
/Step>
</Request>
<
/PreFlow>
<
HTTPTargetConnection>
<URL>
http://target-us.somedomain.com
</URL>
<
/HTTPTargetConnection>
</TargetEndpoint>
And reuse the same quota policy “ Quota-Minute-Target-Server” in the preflow of the
other target endpoint “Target-EU” a
s well:
<TargetEndpoint
name
=
"Target-EU"
>
<
PreFlow
name
="PreFlow"
>
<Request>
<
Step>
<Name>
Quota-Minute-Target-Server
</Name>
<
/Step>
40
</Request>
<
Response/>
<
/PreFlow>
<
HTTPTargetConnection>
<URL>
http://target-us.somedomain.com
</URL>
<
/HTTPTargetConnection>
</TargetEndpoint>
A little later, we get the 11th API request with the resource path as /
target-us, l et’s say
after 32 seconds.
We expect the request to go through successfully assuming that we still have 6 API
requests for the target endpoint “target-us” as per the quota allowed.
However, in reality, we get a Quota violation error .
Reason: Since we are using the same quota policy in both the target endpoints, a single
quota counter is used to track the API requests hitting both the target endpoints. Thus we
exhaust the quota of 10 requests per minute collectively rather than for the individual
target endpoint.
Impact
This antipattern can result in a fundamental mismatch of expectations, leading to a
perception that the quota limits got exhausted ahead of time.
41
Best Practice
42
43
Target Endpont “Target-EU”
<TargetEndpoint name=
"Target-EU"
>
<
PreFlow
name
="PreFlow"
>
<Request>
<
Step>
<Name>
Quota-Minute-Target-Server-EU
</Name>
<
/Step>
</Request>
<Response/>
<
/PreFlow>
<
HTTPTargetConnection>
<URL>
http://target-us.somedomain.com
</URL>
<
/HTTPTargetConnection>
</TargetEndpoint>
c. Since we are using separate Quota policy in the target endpoints
“Target-US” and “Target-EU”, a separate counter will be maintained. This
ensures that we get the allowed quota of 10 API requests per minute for
each of the target endpoints.
Further reading
● Quota policy
44
Antipattern
45
Consider the example below, where an OAuthV2 policy in the API proxy flow has failed
with an I
nvalidAccessTokenerror. Upon failure, Edge will set the fault.nameas
InvalidAccessToken , enter into the error flow, and execute any defined FaultRules. In
the FaultRule, there is a RaiseFault policy named R aiseFaultthat sends a customized
error response whenever an InvalidAccessTokenerror occurs. However, the use of
RaiseFault policy in a FaultRule means that the f ault.namevariable is overwritten and
masks the true cause of the failure.
<FaultRules>
<
FaultRulen
ame
="generic_raisefault"
>
<Step>
<Name>
RaiseFault
</Name>
<Condition>
(fault.name equals "invalid_access_token") or
(fault.name equals "InvalidAccessToken")
</Condition>
</Step>
<
/FaultRule>
</FaultRules>
<FaultRules>
<FaultRule
name
="fault_rule"
>
....
<Step>
<Name>
RaiseFault
</Name>
<Condition>
!(fault.name equals "RaiseFault")
</Condition>
</Step>
</FaultRule>
</FaultRules>
As in the first scenario, the key fault variables f
ault.name
, fault.code
, and
fault.policyare overwritten with the RaiseFault policy's name. This behavior makes it
46
almost impossible to determine which policy actually caused the failure without accessing
a trace file showing the failure or reproducing the issue.
Impact
Use of the RaiseFault policy as described above results in overwriting key fault variables
with the RaiseFault policy's name instead of the failing policy's name. In Analytics and
Nginx Access logs, the x_apigee_fault_codeand x _apigee_fault_policy
PI Monitoring, the Fault Codeand F
variables are overwritten. In A ault Policy a re
overwritten. This behavior makes it difficult to troubleshoot and determine which policy is
the true cause of the failure.
47
In the screenshot below from API Monitoring, you can see that the Fault Code and Fault
policy were overwritten to generic R aiseFaultvalues, making it impossible to determine
the root cause of the failure from the logs:
Best Practice
1. When an Edge policy raises a fault and you want to customize the error response
message, use the AssignMessage or JavaScript policies instead of the RaiseFault
policy.
2. The RaiseFault policy should be used in a non-error flow. That is, only use
RaiseFault to treat a specific condition as an error, even if an actual error has not
occurred in a policy or in the backend server of the API proxy. For example, you
could use the RaiseFault policy to signal that mandatory input parameters are
missing or have incorrect syntax.
3. You can also use RaiseFault in a fault rule if you want to detect an error during the
processing of a fault. For example, your fault handler itself could cause an error
that you want to signal by using RaiseFault.
48
Further reading
● Handling Faults
● Raise Fault policy
● Community discussion on Fault Handling Patterns
49
50
● context.targetRequest.headers.
header-name
● context.proxyResponse.headers.
header-name
● context.targetResponse.headers.
header-name
Here’s a sample AssignMessage policy showing how to read the value of a request header
and store it into a variable:
<AssignMessage continueOnError
="false"
enabled ="
true"
name
="assign-message-default"
>
<AssignVariable>
<Name> </Name>
reqUserAgent
request.header.User-Agent
<Ref> </Ref>
</AssignVariable>
</AssignMessage>
Antipattern
Accessing the values of HTTP headers in Edge policies in a way that returns only the first
value is incorrect, especially if the specific HTTP Header(s) has more than one value and
all these values are required for processing by target servers, client applications or a policy
within Apigee Edge.
The following sections contain examples of header access.
Example 1: Read a multi-valued Accept header using JavaScript code
Consider that the A cceptheader has multiple values as shown below:
Accept : text/html, application/xhtml+xml, application/xml
Here's the JavaScript code that reads the value from Acceptheader:
// Read the values from Accept header
varacceptHeaderValues =
context.getVariable(
"request.header.Accept"
);
51
The above JavaScript code returns only the first value from the Acceptheader, such as
text/html .
Example 2: Read a multi-valued Access-Control-Allow-Headers header in
AssignMessage or RaiseFault policy
Consider that the A
ccess-Control-Allow-Headersheader has multiple values as
shown below:
Access-
Control -
Allow
-
Headers : content-type, authorization
Here’s the part of code from AssignMessage or RaiseFault policy setting the
Access-Control-Allow-Headersheader:
<Set>
<
Headers>
<Header
name
="Access-Control-Allow-Headers"
>
{request.header.Access-Control-Re
quest-Headers}
</Header>
<
/Headers>
</Set>
The above code sets the Header A ccess-Control-Allow-Headerswith only the first
value from the request header A
ccess-Control-Allow-Headers, in this example
content-type .
Impact
● In both the examples above, notice that only the first value from multi-valued
headers are returned. If these values are subsequently used by another policy in
the API proxy flow or by the backend service to perform some function or logic,
then it could lead to an unexpected outcome or result.
● When request header values are accessed and passed onto the target server, API
requests could be processed by the backend incorrectly and hence they may give
incorrect results.
52
● If the client application is dependent on specific header values from the Edge
response, then it may also process incorrectly and give incorrect results.
Best Practice
1. Use the appropriate built-in flow variables:
request.header. header_name .values.count ,
request.header. header_name .N
,
response.header. header_name .values.count ,
response.header. header_name.N .
Then iterate to fetch all the values from a specific header in JavaScript or Java
callout policies.
Example: Sample JavaScript code to read a multi-value header
for(
vari = 1; i
<=context.getVariable(
'request.header.Accept.values.count'
);
i++)
{
p
rint
(context.getVariable(
'request.header.Accept.'+ i));
}
Note: Edge splits the header values considering comma as a delimiter and not
through other delimiters like semicolon etc.
For example "application/xml;q=0.9, */*;q=0.8" will appear as one value with the
above code.
If the header values need to be split using semicolon as a delimiter, then use
string.split(";")to separate these into values.
2. Use the s ubstring()function on the flow variable
request.header. header_name .valuesin RaiseFault or AssignMessage policy
to read all the values of a specific header.
Example: Sample RaiseFault or AssignMessage policy to read a multi-value
header
53
<Set>
<
Headers>
<
Header
name
="Access-Control-Allow-Headers"
>
{substring(request.header.Ac
cess-Control-Request-Headers.values,1,-1)}
</Header>
<
/Headers>
</Set>
Further reading
● How to handle multi-value headers in Javascript?
● List of HTTP header fields
● Request and response variables
● Flow Variables
54
Antipattern
Using service callouts to invoke a backend service in an API proxy with no routes to a
target endpoint is technically feasible, but will result in the loss of analytics data about the
performance of the external service.
An API proxy that does not contain target routes can be useful in cases where the request
message does not need to be forwarded to the TargetEndpoint. Instead, the
ProxyEndpoint performs all of the necessary processing. For example, the ProxyEndpoint
could retrieve data from a lookup to the API Services's key/value store and return the
response without invoking a backend service.
You can define a null Route in an API proxy, as shown here:
<RouteRule name="noroute" />
A proxy using a null route is a "no target" proxy, because it does not invoke a target
backend service.
It is technically possible to add a service callout to a no target proxy to invoke an external
service, as shown in the example below:
55
<?xml version=
"1.0"encoding=
"UTF-8"standalone=
"yes"
?>
<ProxyEndpoint
name
="default"
>
<Description/>
<FaultRules/>
<PreFlown
ame
="PreFlow"
>
<Request>
<Step>
<Name>
ServiceCallout-InvokeBackend
</Name>
</Step>
</Request>
<Response/>
</PreFlow>
<PostFlow
name
="PostFlow"
>
<Request/>
<Response/>
</PostFlow>
<Flows/>
<HTTPProxyConnection>
<BasePath>
/no-target-proxy
</BasePath>
<Properties/>
<VirtualHost>
secure
</VirtualHost>
</HTTPProxyConnection>
<RouteRule
name
="noroute"
/>
</ProxyEndpoint>
However, the proxy can not provide analytics information about the external service
behaviour (such as processing time or error rates), making it difficult to assess the
performance of the target server.
Impact
● Analytics information on the interaction with the external service (error codes,
response time, target performance, etc.) is unavailable.
○
56
● Any specific logic required before or after invoking the service callout is included
as part of the overall proxy logic, making it harder to understand and reuse.
Best Practice
If an API proxy interacts with only a single external service, the API proxy should follow the
basic design pattern, where the backend service is defined as the target endpoint of the
API proxy. A proxy with no routing rules to a target endpoint should not invoke a backend
service using the ServiceCallout policy.
The following proxy configuration implements the same behaviour as the example above,
but follows best practices:
<?xml version=
"1.0"encoding=
"UTF-8"standalone=
"yes"
?>
<ProxyEndpoint
name
="default"
>
<Description/>
<FaultRules/>
<PreFlown
ame
="PreFlow"
>
<Request/>
<Response/>
</PreFlow>
<PostFlow
name
="PostFlow"
>
<Request/>
<Response/>
</PostFlow>
<Flows/>
<HTTPProxyConnection>
<BasePath>
/simple-proxy-with-route-to-backend
</BasePath>
<Properties/>
<VirtualHost>
secure
</VirtualHost>
</HTTPProxyConnection>
<RouteRule
name
="default"
>
<TargetEndpoint>
default
</TargetEndpoint>
57
</RouteRule>
</ProxyEndpoint>
Use Service callouts to support mashup scenarios, where you want to invoke external
services before or after invoking the target endpoint. Service callouts are not meant to
replace target endpoint invocation.
Further reading
● Understanding APIs and API proxies
● How to configure Route Rules
● Null Routes
● Service Callout policy
58
2. Performance Antipatterns
One of the unique and useful features of Apigee Edge is the ability to wrap a NodeJS
application in an API proxy. This allows developers to create event-driven server-side
applications using Edge.
Antipattern
Deployment of API Proxies is the process of making them available to serve API requests.
Each of the deployed API Proxies is loaded into Message Processor’s runtime memory to
be able to serve the API requests for the specific API proxy. Therefore, the runtime
memory usage increases with the increase in the number of deployed API Proxies. Leaving
any unused API Proxies deployed can cause unnecessary use of runtime memory.
In the case of NodeJS API Proxies, there is a further implication.
The platform launches a “Node app” for every deployed NodeJS API proxy. A Node app is
akin to a standalone node server instance on the Message Processor JVM process.
In effect, for every deployed NodeJS API proxy, Edge launches a node server each, to
process requests for the corresponding proxies. If the same NodeJS API proxy is deployed
in multiple environments, then a corresponding node app is launched for each
environment. In situations where there are a lot of deployed but unused NodeJS API
Proxies, multiple Node apps are launched. Unused NodeJS proxies translate to idle Node
apps which consume memory and affect start up times of the application process.
Used Proxies Unused Proxies
No. of Number of Number of No. of Number of Number of
proxies deployed nodeapps proxies deployed nodeapps
envs launched envs launched
59
In the illustration above, 36 unused nodeapps are launched, which uses up system
memory and has an adverse effect on start up times of the process.
Impact
● High Memory usage and cascading effect on application’s ability to process further
requests.
● Likely Performance impact on the API Proxies that are actually serving traffic.
Best Practice
Undeploy any unused API Proxies.
Further reading
● Overview of Node.js on Apigee Edge
● Getting started with Node.js on Apigee Edge
● Debugging and troubleshooting Node.js proxies
60
3.Generic Antipatterns
61
Antipattern
The management APIs are preferred and useful for administrative tasks and should not be
used for performing any runtime logic in API Proxies flow. This is because:
● Using management APIs to access information about the entities such as
KeyValueMaps, OAuth Access Tokens or for any other purpose from API Proxies
leads to dependency on Management Servers.
● Management Servers are not a part of Edge runtime component and therefore, they
may not be highly available.
● Management Servers also may not be provisioned within the same network or data
center and may therefore introduce network latencies at runtime.
● The entries in the management servers are cached for longer period of time, so we
may not be able to see the latest data immediately in the API Proxies if we perform
writes and reads in a short period of time.
● Increases network hops at runtime.
In the code sample below, management API call is made via the custom JavaScript code
to retrieve the information from the KeyValueMap.
Code Sample : Accessing the KeyValueMap entries from within the JavaScript code
using Management API
var response =
httpClient.send('https://api.enterprise.apigee.com/v1/o/org_name/e/env_name/keyvaluemap
s/kvm_name');
If the management server is unavailable, then the JavaScript code invoking the
management API call fails. This subsequently causes the API request to fail.
Impact
● Introduces additional dependency on Management Servers during runtime. Any
failure in Management servers will affect the API calls.
● User credentials for management APIs need to be stored either locally or in some
secure store such as Encrypted KVM.
62
Best practice
There are more effective ways of retrieving information from entities such as
KeyValueMaps, API Products, DeveloperApps, Developers, Consumer Keys, etc. at runtime.
Here are a few examples:
● Use KeyValueMapOperations policy to access information from KeyValueMaps.
Here’s a sample code that shows how to retrieve information from the
KeyValueMap:
<KeyValueMapOperations mapIdentifier ="urlMap" async=
"false"
continueOnError ="false" enabled ="
true" name ="GetURLKVM"
>
<
DisplayName>
GetURLKVM
</DisplayName>
<
ExpiryTimeInSecs>
86400
</ExpiryTimeInSecs>
<
Scope>
environment
</Scope>
<
Get
assignTo
="urlHost1"i
ndex
="2"
>
<Key>
<
Parameter>
urlHost_1
</Parameter>
</Key>
<
/Get>
</KeyValueMapOperations>
● To access information about API Products, Developer Apps, Developers, Consumer
Keys, etc. in the API proxy, you can do either of the following:
○ If your API proxy flow has a VerifyAPIKey policy, then you can access the
information using the flow variables populated as part of this policy. Here is
a sample code that shows how to retrieve the name and created_by
information of a Developer App using JavaScript:
63
64
Further reading
● Key Value Map Operations policy
● VerifyAPIKey policy
● Access Entity policy
65
Antipattern
Invoking one API proxy from another either using HTTPTargetConnection in the target
endpoint or custom JavaScript code results in additional network hop.
In the code sample 1 below, we are invoking the Proxy 2 from Proxy 1 using
HTTPTargetConnection
The next code sample invokes Proxy 2 from Proxy 1 using JavaScript:
varresponse =
httpClient.send(
'http://myorg-test.apigee.net/proxy2'
);
66
Code Flow
To understand why this has an inherent disadvantage, we need to understand the route a
request takes as illustrated by the diagram below:
Figure 1: C
ode Flow
As depicted in the diagram, a request traverses multiple distributed components, including
the Router and the Message Processor.
In the code samples above , invoking Proxy 2 from Proxy 1 means that the request has to
be routed through the traditional route i.e Router > MP, at runtime. This would be akin to
invoking an API from a client thereby making multiple network hops that add to the
latency. These hops are unnecessary considering that Proxy 1 request has already
‘reached’ the MP.
Impact
Invoking one API proxy from another API proxy incurs unnecessary network hops, that is
the request has to be passed on from one Message Processor to another Message
Processor.
67
Best Practice
1. Use the proxy chaining feature for invoking one API proxy from another. Proxy
chaining is more efficient as it uses local connection to reference the target
endpoint (another API proxy).
The code sample shows proxy chaining using LocalTargetConnection:
<LocalTargetConnection>
<
APIProxy>
proxy2
</APIProxy>
<
ProxyEndpoint>
default
</ProxyEndpoint>
</LocalTargetConnection>
The invoked API proxy gets executed within the same Message Processor; as a
result, it avoids the network hop as shown in the following figure:
68
Further reading
● Proxy Chaining
69
Apigee Edge provides many different types of resources and each of them serve a
different purpose. There are certain resources that can be configured (i.e., created,
updated, and/or deleted) only through the Edge UI, management APIs, or tools that use
management APIs, and by users with the prerequisite roles and permissions. For
example, only org admins belonging to a specific organization can configure these
resources. That means, these resources cannot be configured by end users through
developer portals, nor by any other means. These resources include:
● API proxies
● Shared flows
● API products
● Caches
● KVMs
● Keystores and truststores
● Virtual hosts
● Target servers
● Resource files
While these resources do have restricted access, if any modifications are made to them
even by the authorized users, then the historic data simply gets overwritten with the new
data. This is due to the fact that these resources are stored in Apigee Edge only as per
their current state. The main exceptions to this rule are API proxies and shared flows.
API Proxies and Shared Flows under Revision Control
API proxies and shared flows are managed -- in other words, created, updated and
deployed -- through revisions. Revisions are sequentially numbered, which enables you to
add new changes and save it as a new revision or revert a change by deploying a previous
revision of the API proxy/Shared Flow. At any point in time, there can be only one revision
of an API proxy/Shared Flow deployed in an environment unless the revisions have a
different base path.
70
Although the API proxies and shared flows are managed through revisions, if any
modifications are made to an existing revision, there is no way to roll back since the old
changes are simply overwritten.
Audits and History
Apigee Edge provides t he A
udits and API, Product, and organization history features that
can be helpful in troubleshooting scenarios. These features enable you to view
information like who performed specific operations (create, read, update, delete, deploy,
and undeploy) and when the operations were performed on the Edge resources. However,
if any update or delete operations are performed on any of the Edge resources, the audits
cannot provide you the older data.
Antipattern
Managing the Edge resources (listed above) directly through Edge UI or management
APIs without using source control system
There's a misconception that Apigee Edge will be able to restore resources to their
previous state following modifications or deletes. However, Edge Cloud does not provide
restoration of resources to their previous state. Therefore, it is the user’s responsibility to
ensure that all the data related to Edge resources is managed through source control
management, so that old data can be restored back quickly in case of accidental deletion
or situations where any change needs to be rolled back. This is particularly important for
production environments where this data is required for runtime traffic.
Let’s explain this with the help of a few examples and the kind of impact that can be
caused if the data in not managed through a source control system and is
modified/deleted knowingly or unknowingly:
Example 1: Deletion or modification of API proxy
When an API proxy is deleted, or a change is deployed on an existing revision, the previous
code won’t be recoverable. If the API proxy contains Java, JavaScript, Node.js, or Python
code that is not managed in a source control management (SCM) system outside Apigee,
a lot of development work and effort could be lost.
Example 2: Determination of API proxies using specific virtual hosts
71
A certificate on a virtual host is expiring and that virtual host needs updating. Identifying
which API proxies use that virtual host for testing purposes may be difficult if there are
many API proxies. If the API proxies are managed in an SCM system outside Apigee, then
it would be easy to search the repository.
Example 3: Deletion of keystore/truststore
If a keystore/truststore that is used by a virtual host or target server configuration is
deleted, it will not be possible to restore it back unless the configuration details of the
keystore/truststore, including certificates and/or private keys, are stored in source control.
Impact
● If any of the Edge resources are deleted, then it's not possible to recover the
resource and its contents from Apigee Edge.
● API requests may fail with unexpected errors leading to outage until the resource is
restored back to its previous state.
● It is difficult to search for inter-dependencies between API proxies and other
resources in Apigee Edge.
Best Practice
1. Use any standard SCM coupled with a continuous integration and continuous
deployment (CICD) pipeline for managing API proxies and shared flows.
2. Use any standard SCM for managing the other Edge resources, including API
products, caches, KVMs, t arget servers, virtual hosts, and keystores.
○ If there are any existing Edge resources, then use management APIs to get
the configuration details for them as a JSON/XML payload and store them
in source control management.
○ Manage any new updates to these resources in source control
management.
○ If there’s a need to create new Edge resources or update existing Edge
resources, then use the appropriate JSON/XML payload stored in source
control management and update the configuration in Edge using
management APIs.
72
* Encrypted KVMs cannot be exported in plain text from the API. It is the user’s
responsibility to keep a record of what values are put into encrypted KVMs.
Further reading
● Source Control Management
● Source Control for API proxy Development
● Guide to implementing CI on Apigee Edge
● Maven Deploy Plugin for API Proxies
● Maven Config Plugin to manage Edge Resources
● Apigee Management API (for automating backups)
73
3.4. Define multiple virtual hosts with same host alias and port number
In Apigee Edge, a R
outer handles all incoming API traffic. That means all HTTP and HTTPS
requests to an Edge API proxy are first handled by an Edge Router. Therefore, API proxy
request must be directed to the IP address and open port on a Router.
A virtual host lets you host multiple domain names on a single server or group of servers.
For Edge, the servers correspond to Edge Routers. By defining virtual hosts on a Router,
you can handle requests to multiple domains.
A virtual host on Edge defines a protocol (HTTP or HTTPS), along with a Router port and a
host alias. The host alias is typically a DNS domain name that maps to a Router's IP
address.
For example, the following image shows a Router with two virtual host definitions:
In this example, there are two virtual host definitions. One handles HTTPS requests on the
domain domainName1, the other handles HTTP requests on d omainName2.
On a request to an API proxy, the Router compares the H ost header and port number of
the incoming request to the list of host aliases d
efined by all virtual hosts to determine
which virtual host handles the request.
74
Antipattern
Defining multiple virtual hosts with the same host alias and port number in the
same/different environments of an organization or across organizations will lead to
confusion at the time of routing API requests and may cause unexpected
errors/behaviour.
Let’s use an example to explain the implications of having multiple virtual hosts with same
host alias.
Consider there are two virtual hosts s andboxand s
ecured efined with the same host
alias i.e., a
pi.company.abc.comin an environment:
75
With the above setup, there could be two scenarios:
76
<BasePath>
/demo
</BasePath>
<VirtualHost>
sandbox
</VirtualHost>
<VirtualHost>
secure
</VirtualHost>
<
/HTTPProxyConnection>
...
</ProxyEndpoint>
In this scenario, when the client applications make the calls to specific API proxy using the
host alias a
pi.company.abc.com , they will get a valid response based on the proxy
logic.
However, this causes incorrect data to be stored in Analytics as the API requests are
routed to both the virtual hosts, while the actual requests were aimed to be sent to only
one virtual host.
This can also affect the logging information and any other data that is based on the virtual
hosts.
Impact
● 404 Errors as the API requests may get routed to a virtual host to which the API
proxy may not be configured to accept the requests.
● Incorrect Analytics data since the API requests get routed to all the virtual hosts
having the same host alias while the requests were made only for a specific virtual
host.
Best Practice
1. Don’t define multiple virtual hosts with same host alias and port number in the
same environment, or different environments of an organization.
2. If there’s a need to define multiple virtual hosts, then use different host aliases in
each of the virtual hosts as shown below :
77
If there’s a need to define multiple virtual hosts with same host alias, then it can be
achieved by having different ports numbers for each of the virtual hosts. For example
sandboxvirtual host has the host alias api.company.abc.com with port 443 secure
virtual host also has the same host alias api.company.abc.com but with a different port
8443
Further reading
● Virtual hosts
78
3.5. Load Balance with a single Target Server with MaxFailures set to a
non-zero value
The TargetEndpoint configuration defines the way Apigee Edge connects to a backend
service or API. It sends the requests and receives the responses to/from the backend
service. The backend service can be a HTTP/HTTPS server, NodeJS, or Hosted Target.
The backend service in the TargetEndpoint can be invoked in one of the following ways:
Likewise, the Service Callout policy can be used to make a call to any external service from
the API proxy flow. This policy supports defining HTTP/HTTPS target URLs either directly
in the policy itself or using a TargetServer configuration.
TargetServer
TargetServer configuration decouples the concrete endpoint URLs from TargetEndpoint
configurations or in Service Callout policies. A TargetServer is referenced by a name
instead of the URL in TargetEndpoint. The TargetServer configuration will have the
hostname of the backend service, port number, and other details.
Here is a sample TargetServer configuration:
<TargetServer
name
="target1"
>
<
Host>
www.mybackendservice.com
</Host>
<
Port>
80
</Port>
<
IsEnabled>
true
</IsEnabled>
</TargetServer>
79
The TargetServer enables you to have different configurations for each environment. A
TargetEndpoint/Service Callout policy can be configured with one or more named
TargetServers using a LoadBalancer. The built-in support for load balancing enhances the
availability of the APIs and failover among configured backend server instances.
<TargetEndpoint
name
=
"default"
>
<HTTPTargetConnection>
>
<
LoadBalancer>
<Server
name
=
"target1"
/>
<Server
name
=
"target2"
/>
<
/LoadBalancer>
</HTTPTargetConnection>
</TargetEndpoint>
MaxFailures
The MaxFailuresconfiguration specifies maximum number of request failures to the
target server after which the target server shall be marked as down and removed from
rotation for all subsequent requests.
An example configuration with M axFailuresspecified:
<TargetEndpoint name=
"default" >
<HTTPTargetConnection>
<
LoadBalancer>
<Server
name
=
"target1"
/>
<
Server
name
="target2"
/>
<
MaxFailures>
5
</MaxFailures>
<
/LoadBalancer>
</HTTPTargetConnection>
</TargetEndpoint>
In the above example, if five consequent requests failed for target1 then target1 will be
removed from rotation and all subsequent requests will be sent only to target2.
80
Antipattern
Having single TargetServer in a LoadBalancerconfiguration of the TargetEndpoint or
Service Callout policy with MaxFailuresset to a non-zero value is not recommended as
it can have adverse implications.
Consider the following sample configuration that has a single TargetServer named
“target1” with MaxFailuresset to 5 (non-zero value):
<TargetEndpoint
name
=
"default"
>
<
HTTPTargetConnection>
<
LoadBalancer>
<Algorithm>
RoundRobin
</Algorithm>
<Server
name
=
"target1"/
>
<MaxFailures>
5
</MaxFailures>
<
/LoadBalancer>
<
/HTTPTargetConnection>
If the requests to the TargetServer "target1" fails five times (number specified in
MaxFailures ), the TargetServer is removed from rotation. Since there are no other
TargetServers to fail over to, all the subsequent requests to the API proxy having this
configuration will fail with 503 Service Unavailableerror.
Even if the TargetServer "target1" gets back to its normal state and is capable of sending
successful responses, the requests to the API proxy will continue to return 503 errors. This
is because Edge does not automatically put the TargetServer back in rotation even after
the target is up and running again. To address this issue, t he API proxy must be
redeployed for Edge to put the TargetServer back into rotation.
If the same configuration is used in the Service Callout policy, then the API requests will
get 500 Error after the requests to the TargetServer “target1” fails for 5 times.
81
Impact
Using a single TargetServer in a LoadBalancerconfiguration of TargetEndpoint or
Service Callout policy with MaxFailuresset to a non-zero value causes:
● API Requests to fail with 503/500 Errors continuously (after the requests fail for
MaxFailures number of times) until the API proxy is redeployed.
● Longer outage as it is tricky and can take more time to diagnose the cause of this
issue (without prior knowledge about this antipattern).
Best Practice
1. Have more than one TargetServer in the L oadBalancerconfiguration for higher
availability.
2. Always define a Health Monitor when MaxFailuresis set to a non-zero value. A
target server will be removed from rotation when the number of failures reaches
the number specified in MaxFailures . Having a HealthMonitor ensures that the
TargetServer is put back into rotation as soon as the target server becomes
available again, meaning there is no need to redeploy the proxy.
To ensure that the health check is performed on the same port number that Edge
uses to connect to the target servers, Apigee recommends that you omit the
<Port>child element under < TCPMonitor>unless it is different from
TargetServer port. By default < Port>is the same as the TargetServer port.
Sample configuration with HealthMonitor:
<TargetEndpoint
name
=
"default"
>
<
HTTPTargetConnection>
<LoadBalancer>
<
Algorithm>
RoundRobin
</Algorithm>
<
Server
name
="target1"
/>
<
Server
name
="target2"
/>
<
MaxFailures>
5</MaxFailures>
82
</LoadBalancer>
<Path>
/test
</Path>
<HealthMonitor>
<
IsEnabled>
true
</IsEnabled>
<
IntervalInSec>
5</IntervalInSec>
<
TCPMonitor>
<ConnectTimeoutInSec>
10
</ConnectTimeoutInSec>
<
/TCPMonitor>
</HealthMonitor>
<
/HTTPTargetConnection>
</TargetEndpoint>
3. If there’s some constraint such that only one TargetServer and if the HealthMonitor
is not used, then don’t specify M axFailuresin the L
oadBalancerconfiguration.
○ The default value of MaxFailures is 0. This means that Edge always tries to
connect to the target for each request and never removes the target server
from the rotation.
Further reading
● Load Balancing across Backend Servers
● How to use Target Servers in your API Proxies
83
Antipattern
Accessing the request/response payload with streaming enabled causes Edge to go back
to the default buffering mode.
84
The illustration above shows that we are trying to extract variables from the request
payload and converting the JSON response payload to XML using JSONToXML policy.
This will disable the streaming in Edge.
Impact
● Streaming will be disabled which can lead to increased latencies in processing the
data.
● Increase in the heap memory usage or OutOfMemory Errors can be observed on
Message Processors due to use of in-memory buffers especially if we have large
request/response payloads.
Best Practice
Don’t access the request/response payload when streaming is enabled.
Further reading
● Streaming requests and responses
● How does APIGEE Edge Streaming works ?
● How to handle streaming data together with normal request/response payload in a
single API proxy
● Best practices for API proxy design and development
85
Antipattern
An API proxy can have one or more proxy endpoints. Defining multiple ProxyEndpoints is
an easy and simple mechanism to implement multiple APIs in a single proxy. This lets you
reuse policies and/or business logic before and after the invocation of a TargetEndpoint.
On the other hand, when defining multiple ProxyEndpoints in a single API proxy, you end
up conceptually combining many unrelated APIs into a single artefact. It makes the API
Proxies harder to read, understand, debug, and maintain. This defeats the main philosophy
of API Proxies: making it easy for developers to create and maintain APIs.
Impact
Multiple ProxyEndpoints in an API proxy can:
● Make it hard for developers to understand and maintain the API proxy.
● Obfuscate analytics. By default, analytics data is aggregated at the proxy level.
There is no breakdown of metrics by proxy endpoint unless you create custom
reports.
● Make it difficult to troubleshoot problems with API Proxies.
Best Practice
When you are implementing a new API proxy or redesigning an existing API proxy, use the
following best practices:
1. Implement one API proxy with a single ProxyEndpoint.
86
2. If there are multiple APIs that share common target server and/or require the same
logic pre or post invocation of the target server, consider using shared flows to
implement such logic in different API Proxies.
3. If there are multiple APIs that share a common starting basepath, but differ in the
suffix, use conditional flows in a single ProxyEndpoint.
4. If there exists an API proxy with multiple ProxyEndpoints and if there are no issues
with it, then there is no need to take any action.
Using one ProxyEndpoint per API proxy leads to:
1. Simpler, easier to maintain proxies
2. Better information in Analytics; proxy performance, target response time, etc. will
be reported separately instead of rolled up for all ProxyEndpoints
3. Faster troubleshooting and issue resolution
Further reading
● API proxy Configuration Reference
● Reusable Shared Flows
87
4. Backend Antipatterns
In effect, the performance of API requests routed via Edge is dependent on both Edge and
the backend systems. In this anti pattern, we will focus on the impact on API requests due
to badly performing backend systems.
88
Antipattern
Let us consider the case of a problematic backend. These are the possibilities:
● Inadequately sized backend
● Slow backend
Slow backend
The problem with an improperly tuned backend is that it would be very slow to respond to
any requests coming to it, thereby leading to increased latencies, premature timeouts and
a compromised customer experience.
The Edge platform offers a few tunable options to circumvent and manage the slow
backend. But these options have limitations.
89
Impact
● In the case of an inadequately sized backend, increase in traffic could lead to failed
requests.
● In the case of a slow backend, the latency of requests will increase.
Best Practice
1. Use caching to store the responses to improve the API response times and reduce
the load on the backend server.
2. Resolve the underlying problem in the slow backend servers.
Further reading
● Edge Caching Internals
90
Persistent Connections
HTTP persistent connection, also called HTTP keep-alive or HTTP connection reuse, is a
concept that allows a singleTCP connection to send and receive multiple HTTP
requests/responses, instead of opening a new connection for every request/response
pair.
Apigee Edge uses persistent connections for communicating with the backend services by
default. A connection is kept alive for 60 seconds by default. That is, if a connection is
kept idle in the connection pool for more than 60 seconds, then the connection will be
closed.
The keep alive timeout period is configurable through a property named
keepalive.timeout.millis ,
which can be specified in the TargetEndpoint
configuration of an API proxy. For example, the keep alive time period can be set to 30
seconds for a specific backend service in the TargetEndpoint.
<TargetEndpoint
name
=
"default"
>
<
HTTPTargetConnection>
<URL>
http://mocktarget.apigee.net
</URL>
<Properties>
91
<
Property
name
=
"keepalive.timeout.millis"
>30000
</Property>
</Properties>
<
/HTTPTargetConnection>
</TargetEndpoint>
In the example above, k eepalive.timeout.milliscontrols the keep alive behavior for
a specific backend service in an API proxy. There is also a property that controls keep alive
behavior for all backend services in all proxies. The
HTTPTransport.keepalive.timeout.millisis configurable in the Message
Processor component. This property also has a default value of 60 seconds. Making any
modifications to this property affects the keep alive connection behavior between Apigee
Edge and all the backend services in all the API proxies.
Antipattern
Disabling persistent (keep alive) connections by setting the property
keepalive.timeout.millisto 0 in the TargetEndpoint configuration of a specific API
proxy or setting the H
TTPTransport.keepalive.timeout.millisto 0 on Message
Processors is not recommended as it will impact performance.
In the example below, the TargetEndpoint configuration disables persistent (keep alive)
connections for a specific backend service by setting keepalive.timeout.millisto
0:
<TargetEndpoint
name
=
"default"
>
<
HTTPTargetConnection>
<URL>
http://mocktarget.apigee.net
</URL>
<Properties>
<
Property
name
=
"keepalive.timeout.millis"
>0<
/Property>
</Properties>
<
/HTTPTargetConnection>
</TargetEndpoint>
If the keep alive connections are disabled for one or more backend services, then Edge will
need to open a new connection every time a new request is to be made to the specific
92
backend service(s). In addition, SSL handshake must be performed for every new request
if the backend is HTTPs, adding to the overall latency of the API requests.
Impact
● Increases the overall response time of the API requests as Apigee Edge will need
to open a new connection, as SSL handshake must be performed for every new
request.
● Connections may get exhausted under high traffic conditions, as it takes some time
for the connections to be released back to the system.
Best Practice
1. Backend services should honor and handle HTTP persistent connection in
accordance with HTTP 1.1 standards.
Further reading
● HTTP persistent connections
● Keep-Alive
● Endpoint Properties Reference
93
Analytics Services is a very powerful built-in feature provided by Apigee Edge. It collects
and analyzes a broad spectrum of data that flows across APIs. The analytics data
captured can provide very useful insights. For instance, How is the API traffic volume
trending over a period of time? Which is the most used API? Which APIs are having high
error rates?
Regular analysis of this data and insights can be used to take appropriate actions such as
future capacity planning of APIs based on the current usage, business and future
investment decisions, and many more.
1. Information about an API - Request URI, Client IP address, Response Status Codes,
etc.
2. API proxy Performance - Success/Failure rate, Request and Response Processing
Time, etc.
3. Target Server Performance - Success/Failure rate, Processing Time
4. Error Information - Number of Errors, Fault Code, Failing Policy, Number of Apigee
and Target Server caused errors
5. Other Information - Number of requests made by Developers, Developer Apps, etc
94
The schema named a nalyticsis used by Edge for storing all the analytics data for each
organization and environment. If monetization is installed there will be a r
kmsschema.
Other schemata are meant for Postgres internals.
The analyticsschema will keep changing as Apigee Edge will dynamically add new fact
tables to it at runtime. The Postgres server component will aggregate the fact data into
aggregate tables which get loaded and displayed on the Edge UI.
Antipattern
Adding custom columns, tables, and/or views to any of the Apigee owned schemata in
Postgres Database on Private Cloud environments directly using SQL queries is not
advisable, as it can have adverse implications.
Let’s take an example to explain this in detail.
Consider a custom table named a ccount has been created under analytics schema as
shown below:
95
After a while, let’s say there’s a need to upgrade Apigee Edge from a lower version to a
higher version. Upgrading Private Cloud Apigee Edge involves upgrading Postgres
amongst many other components. If there are any custom columns, tables, or views
added to the Postgres Database, then Postgres upgrade fails with errors referencing to
custom objects as they are not created by Apigee Edge. Thus, the Apigee Edge upgrade
also fails and cannot be completed.
Similarly errors can occur during Apigee Edge maintenance activities where in backup and
restore of Edge components including Postgres Database are performed.
Impact
● Apigee Edge upgrade cannot be completed because the Postgres component
upgrade fails with errors referencing to custom objects not created by Apigee
Edge.
96
Best Practice
1. Don’t add any custom information in the form of columns, tables, views, functions,
and procedures directly to any of the Apigee owned schema such as analytics etc.
2. If there’s a need to support custom information, it can be added as columns
(fields) using a Statistics Collector policy to analyticsschema.
Further reading