DDI Grid Best Practices 8.2v2.3

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

1

2
All *0 appliance have a end of sale date of 02/14/2018 and end of life of 02/15/2021

All performance numbers should be interpreted as maximum capacity; this means


that maxing out on one will prevent you from reaching the other

3
The grid master model type depends on
• member count
• level of activity of the members
• number of admins
• object count
• API

Note: DNSSEC requires the GM to serve DNS, this is the only exception to running
services on the GM
Take GM CPU load into account when running DNSSEC

When in doubt, request ARB review!

Note:
A member is either a single appliance or an HA pair of appliances
Running services on the 1425 and larger is possible, it is marked as NO in
this table as a design that requires a 1425 or larger GM should no be running
services on it.

4
When GM is not part of HA pair, or there is no GMC then a GM failure requires:
1. waiting for the RMA
2. restoring from backup

HA should never be split across datacenters despite any precautions that can be
taken split brain conditions can have a huge impact on a grid.

For DR purposes the GMC should not be in the same physical location as the GM

All hardware performance details for appliance can be found in:


https://docs.google.com/spreadsheets/d/1fvj5w9LNmu49toOn-
kHewiXGokwrjadcveAgmYsS2a4

5
Note: IPv4 and IPv6 should not update the same forward-mapping zone as it will
cause collision in TXT records used by DHCP

Object count for dual stack:


• 3 Leases: 2 one on each DHCPFO node + one IPv6
• 2 PTR records (on for each IP)
• 2 TXT records for DDNS
• 1 A record
• 1 AAAA
• 1 IPAM object

A rule of thumb is 15 times the number of IP’s in use, which gives you 66% capacity
on implementation

6
IPv4 up to 10 objects

IPv4 AND IPv6 up to 18 objects

When considering DNS records number of DNSSEC specific objects required for a
signed zone is equal to 4*the number of existing records
DNSSEC signing should be selective and object impact based on planned zones to be
signed.

7
Note: Multi master is not available for signed zones

By record set we indicate the group of records for the same entry.

Double signing, while supported, should not be used and pre-publishing should
always be preferred due to the impact on objects being created

Note that DDNS in combination with DNSSEC can cause a very high load on the GM
and in most cases the next model of GM should be selected. Update to the zone will
trigger a zone resign which has to be processed by the GM.

8
9
For a typical 10,000 person organization, we expect 3000 query/second in DNS
queries
3000 qps results in 78 GB of logs per day
1000 qps results in 25,6 B of logs per day

Syslog and reporting cannot filter this amount of data


Data Connector CAN filter this data

Data Connector is preferred from a load and data perspective


Single Data Connector can handle up to 47kqps. Multiple Data Connectors per grid
can resolve large qps counts.

DC requires ecosystem license to send data to non Infoblox Splunk.

Only response logs are required to capture successful queries. (If you want to capture
timeouts or indicators of compromise then query and response are needed)

10
On lease logging:
Since free reporting is available and the custom reports give you full control over
dhcp lease data this feature should be deferred. Reporting allows you to integrate
lease information with discovered data as well as user data.

11
DTC is an embedded part of the DNS serving protocol, this requires each member to
have a license.
If a member doe not have a license it will fall back to serving non DTC responses, this
can have unintended consequences

Sizing example calculation:


For 20000 QPS of which 25% queries are load balanced the calculation becomes:
1.2 * 20000*0.25+20000*(1-0.25) = 21000 effective QPS => TE-1410

A DTC license is required for hidden primaries for a load balanced zone.

Sizing calculator on ARB-1 will provide more information for other scenarios and
combination with other variables

12
In general there is no performance improvement or degradation due to Multimaster,
we are leveraging the same code in all places.

There is an upper limit for NIOS. It maxes out at 800 DNS database transactions per
second as in a global rate of updates across the grid. To address this lease times
should be increased and DDNS can be disabled on selective networks.

There is no limitation on the number of members in a MM setup anymore and no


overhead for running MM.

13
Note: a fix for the subscription license behavior will be in NIOS 8.3

14
Refer to details of performance impact in sizing calculator

Order of feeds should be:


1. Manual whitelist
2. Manual blacklist
3. FireEye feed (if applicable)
4. Subscription feeds

15
RPZ feeds will consume memory, in order to maintain the rated QPS for the appliance
use the table above, the values are tied to the number of records in the feed.
An 820 will maintain 100% of its rated QPS for RPZ zones with up to 500,000 records

The sizing with details of the feeds can be found here:


https://docs.google.com/spreadsheets/d/1lRaHGo04wFniBFB4tgr5SvT4MPKYGrkGOB
nQdHHbf4k

16
These values are rule of thumb, the member sizing calculator trumps any
estimations in this deck and gives precise calculations

17
When considering object count make sure to include MS objects for sizing purposes

For an AD-integrated forest, each AD domain in the forest needs to be separately


configured for synchronization:
By default, AD domains are set up to replicate in the Domain container only. The
exception is the Forest Root zone _msdcs replicates in the Forest container. If
customers manipulate the default replication container setting for zones, then fewer
syncs may be required BUT some zones added in the future may be missed.

18
When considering object count make sure to include MS objects for sizing purposes

19
20
Caution:
• Number of objects is more critical to member performance than server count.
• Avoid syncing to a Grid Master.
• TE-820 should not be used if it is also the Grid Master.

Grid maximum servers means number of servers that can be managed by a grid with
that model as the grid master.

Note:
• If there are many zones (>2000) per managing server, consider a higher capacity
server.
• Sync interval can be increased which gives more flexibility with large datasets

21
In-house testing can not test all the possible hardware that a customer might use to
deploy vNIOS.

Note: Mind resource contention from other virtual machines CPU/mem/IO. To


combat this the next size of appliance can be provisioned to ensure it meets
performance expectations.

Assume identical performance between physical and virtual appliance.

Check release notes for which models are supported on which hypervisors

HA pairs on virtual platforms are advised to guarantee service during upgrades

22
VMware HA is not supported and will cause service outages during failovers.

Assume identical performance between physical and virtual appliance.

Check release notes for which models are supported on which hypervisors

23
The rule that HyperV cannot be a gridmaster has been rescinded

24
25
26
27
DHCP is supported on Azure
Currently there are known issues with it

28
Note that while DHCP is possible in Azure there are known problems with it.

29
ADP requires the features license as well as the subscription license

When tuning, PS time is advised:


Use “monitor” mode when first deploying to minimize operational impact
Use “rate limiting”, not “blocking”, for the rate algorithm

While not an ADP feature, smart cache should be enabled on deployment

Note: HA, feature license on each member, for the ADP subscription: single
subscription per HA pair

30
31
32
Note: some considerations apply when deploying virtual reporting members:
https://answers.splunk.com/answers/298/can-i-run-splunk-in-a-vm-are-
there-any-issues-or-tricks-i-should-be-aware-of.html
https://www.splunk.com/web_assets/pdfs/secure/Splunk_and_VMware
_VMs_Tech_Brief.pdf

33
Refer to and utilize sizing spreadsheet for help with sizing of reporting

Note: when adding clustering only new data is protected

34
License requires Threat Insight license AND Subscription license

35
The detection and window size is the amount of traffic the engine is capable of
looking at, this is determined by memory and cpu of the appliance, however with TI-
iTC it is not limited by these constraints

Performance note:
Latency
between packet on prem, to detection, to entry in rpz on prem is up to
2 minute
between DC and TI-iTC is ok up to 1000 ms

36
Certain orchestration tools do not work with CP appliances, if unsure submit to ARB

37
API projects for customers with large datasets (>500k objects) require careful
consideration and should be vetted by ARB including the planned API calls
API performance can be improved by using optimized queries

38
For data collection over SNMP RTT should be under 500ms
You cannot have NI if you are using an 820 GM

39
40
41
42

You might also like