DDI Grid Best Practices 8.2v2.3

1
2
All *0 appliance have a end of sale date of 02/14/2018 and end of life of 02/15/2021
All performance numbers should be interpreted as maximum capacity; this means

that maxing out on one will prevent you from reaching the other
3
The grid master model type depends on
• member count
• level of activity of the members
• number of admins
• object count
• API
Note: DNSSEC requires the GM to serve DNS, this is the only exception to running
services on the GM
Take GM CPU load into account when running DNSSEC
When in doubt, request ARB review!
Note:
A member is either a single appliance or an HA pair of appliances
Running services on the 1425 and larger is possible, it is marked as NO in
this table as a design that requires a 1425 or larger GM should no be running
services on it.
4
When GM is not part of HA pair, or there is no GMC then a GM failure requires:
1. waiting for the RMA
2. restoring from backup
HA should never be split across datacenters despite any precautions that can be
taken split brain conditions can have a huge impact on a grid.
For DR purposes the GMC should not be in the same physical location as the GM
All hardware performance details for appliance can be found in:

https://docs.google.com/spreadsheets/d/1fvj5w9LNmu49toOn-
kHewiXGokwrjadcveAgmYsS2a4
5
Note: IPv4 and IPv6 should not update the same forward-mapping zone as it will
cause collision in TXT records used by DHCP
Object count for dual stack:

• 3 Leases: 2 one on each DHCPFO node + one IPv6
• 2 PTR records (on for each IP)
• 2 TXT records for DDNS
• 1 A record
• 1 AAAA
• 1 IPAM object
A rule of thumb is 15 times the number of IP’s in use, which gives you 66% capacity
on implementation
6
IPv4 up to 10 objects
IPv4 AND IPv6 up to 18 objects
When considering DNS records number of DNSSEC specific objects required for a
signed zone is equal to 4*the number of existing records
DNSSEC signing should be selective and object impact based on planned zones to be
signed.
7
Note: Multi master is not available for signed zones
By record set we indicate the group of records for the same entry.
Double signing, while supported, should not be used and pre-publishing should
always be preferred due to the impact on objects being created
Note that DDNS in combination with DNSSEC can cause a very high load on the GM
and in most cases the next model of GM should be selected. Update to the zone will
trigger a zone resign which has to be processed by the GM.
8
9
For a typical 10,000 person organization, we expect 3000 query/second in DNS
queries
3000 qps results in 78 GB of logs per day
1000 qps results in 25,6 B of logs per day
Syslog and reporting cannot filter this amount of data

Data Connector CAN filter this data
Data Connector is preferred from a load and data perspective

Single Data Connector can handle up to 47kqps. Multiple Data Connectors per grid
can resolve large qps counts.
DC requires ecosystem license to send data to non Infoblox Splunk.
Only response logs are required to capture successful queries. (If you want to capture
timeouts or indicators of compromise then query and response are needed)
10
On lease logging:
Since free reporting is available and the custom reports give you full control over
dhcp lease data this feature should be deferred. Reporting allows you to integrate
lease information with discovered data as well as user data.
11
DTC is an embedded part of the DNS serving protocol, this requires each member to
have a license.
If a member doe not have a license it will fall back to serving non DTC responses, this
can have unintended consequences
Sizing example calculation:

For 20000 QPS of which 25% queries are load balanced the calculation becomes:
1.2 * 20000*0.25+20000*(1-0.25) = 21000 effective QPS => TE-1410
A DTC license is required for hidden primaries for a load balanced zone.
Sizing calculator on ARB-1 will provide more information for other scenarios and
combination with other variables
12
In general there is no performance improvement or degradation due to Multimaster,
we are leveraging the same code in all places.
There is an upper limit for NIOS. It maxes out at 800 DNS database transactions per
second as in a global rate of updates across the grid. To address this lease times
should be increased and DDNS can be disabled on selective networks.
There is no limitation on the number of members in a MM setup anymore and no

overhead for running MM.
13
Note: a fix for the subscription license behavior will be in NIOS 8.3
14
Refer to details of performance impact in sizing calculator
Order of feeds should be:

1. Manual whitelist
2. Manual blacklist
3. FireEye feed (if applicable)
4. Subscription feeds
15
RPZ feeds will consume memory, in order to maintain the rated QPS for the appliance
use the table above, the values are tied to the number of records in the feed.
An 820 will maintain 100% of its rated QPS for RPZ zones with up to 500,000 records
The sizing with details of the feeds can be found here:

https://docs.google.com/spreadsheets/d/1lRaHGo04wFniBFB4tgr5SvT4MPKYGrkGOB
nQdHHbf4k
16
These values are rule of thumb, the member sizing calculator trumps any
estimations in this deck and gives precise calculations
17
When considering object count make sure to include MS objects for sizing purposes
For an AD-integrated forest, each AD domain in the forest needs to be separately

configured for synchronization:
By default, AD domains are set up to replicate in the Domain container only. The
exception is the Forest Root zone _msdcs replicates in the Forest container. If
customers manipulate the default replication container setting for zones, then fewer
syncs may be required BUT some zones added in the future may be missed.
18
When considering object count make sure to include MS objects for sizing purposes
19
20
Caution:
• Number of objects is more critical to member performance than server count.
• Avoid syncing to a Grid Master.
• TE-820 should not be used if it is also the Grid Master.
Grid maximum servers means number of servers that can be managed by a grid with
that model as the grid master.
Note:
• If there are many zones (>2000) per managing server, consider a higher capacity
server.
• Sync interval can be increased which gives more flexibility with large datasets
21
In-house testing can not test all the possible hardware that a customer might use to
deploy vNIOS.
Note: Mind resource contention from other virtual machines CPU/mem/IO. To

combat this the next size of appliance can be provisioned to ensure it meets
performance expectations.
Assume identical performance between physical and virtual appliance.
Check release notes for which models are supported on which hypervisors
HA pairs on virtual platforms are advised to guarantee service during upgrades
22
VMware HA is not supported and will cause service outages during failovers.
Assume identical performance between physical and virtual appliance.
Check release notes for which models are supported on which hypervisors
23
The rule that HyperV cannot be a gridmaster has been rescinded
24
25
26
27
DHCP is supported on Azure
Currently there are known issues with it
28
Note that while DHCP is possible in Azure there are known problems with it.
29
ADP requires the features license as well as the subscription license
When tuning, PS time is advised:

Use “monitor” mode when first deploying to minimize operational impact
Use “rate limiting”, not “blocking”, for the rate algorithm
While not an ADP feature, smart cache should be enabled on deployment
Note: HA, feature license on each member, for the ADP subscription: single
subscription per HA pair
30
31
32
Note: some considerations apply when deploying virtual reporting members:
https://answers.splunk.com/answers/298/can-i-run-splunk-in-a-vm-are-
there-any-issues-or-tricks-i-should-be-aware-of.html
https://www.splunk.com/web_assets/pdfs/secure/Splunk_and_VMware
_VMs_Tech_Brief.pdf
33
Refer to and utilize sizing spreadsheet for help with sizing of reporting
Note: when adding clustering only new data is protected
34
License requires Threat Insight license AND Subscription license
35
The detection and window size is the amount of traffic the engine is capable of
looking at, this is determined by memory and cpu of the appliance, however with TI-
iTC it is not limited by these constraints
Performance note:
Latency
between packet on prem, to detection, to entry in rpz on prem is up to
2 minute
between DC and TI-iTC is ok up to 1000 ms
36
Certain orchestration tools do not work with CP appliances, if unsure submit to ARB
37
API projects for customers with large datasets (>500k objects) require careful
consideration and should be vetted by ARB including the planned API calls
API performance can be improved by using optimized queries
38
For data collection over SNMP RTT should be under 500ms
You cannot have NI if you are using an 820 GM
39
40
41
42

DDI Grid Best Practices 8.2v2.3

Uploaded by

Copyright:

Available Formats

You might also like

DDI Grid Best Practices 8.2v2.3

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DDI Grid Best Practices 8.2v2.3

Uploaded by

Copyright:

Available Formats

1

All performance numbers should be interpreted as maximum capacity; this means

When in doubt, request ARB review!

All hardware performance details for appliance can be found in:

Object count for dual stack:

IPv4 AND IPv6 up to 18 objects

Syslog and reporting cannot filter this amount of data

Data Connector is preferred from a load and data perspective

DC requires ecosystem license to send data to non Infoblox Splunk.

Sizing example calculation:

There is no limitation on the number of members in a MM setup anymore and no

Order of feeds should be:

The sizing with details of the feeds can be found here:

For an AD-integrated forest, each AD domain in the forest needs to be separately

Note: Mind resource contention from other virtual machines CPU/mem/IO. To

Assume identical performance between physical and virtual appliance.

HA pairs on virtual platforms are advised to guarantee service during upgrades

Assume identical performance between physical and virtual appliance.

When tuning, PS time is advised:

While not an ADP feature, smart cache should be enabled on deployment

Note: when adding clustering only new data is protected

You might also like