Monday, June 23, 2014

Troubleshooting: Understanding How Time Affects When Changes are Effective

The best way to summarize how time works in an AD environment that leverages Centrify agents for Unix, Linux and Mac OS X is to look at this formula:

Effective Changes Σ(Provisioning, AD Replication, Cache Flush Interval)

Changes are additions, deletions or modifications of AD objects (LDAP).  This excludes real-time Kerberos transactions like authentication or password changes.

Provisioning


In Centrify for Unix/Linux, we call provisioning the action of assigning an existing AD principal (user or group) a UNIX identity (login,UID,GID,Home,GECOS,Shell).  In the case of users they also need a role to be able to log into systems; both actions can happen manually or automatically (via an Identity Management Solution that leverages the Centrify APIs, via the  Zone Provisioning Agent utility, or programmatically via Centrify PowerShell or Centrify adedit).
For example, in my environment I use ZPA and I assign the roles to AD groups so the management is simplified.  I've also nested my role-granting-group into the provisioning group.  This means that my modified equation is:

Effective Changes @ Contoso AD Replication of Group Membership + ZPA Polling Interval + AD Replication of ZPA Provisioning + Cache Flush Interval)

Notice how my provisioning design has an impact on time.  I have potentially two actions that require AD replication and an agent that I've set up to poll every 15 minutes.


This means that a provisioning action can take as long as 15:30 from a provisioning + AD perspective in an intra-site scenario assuming (in the case of a user) that both the identity and the role were granted at the same time.

AD Replication

If you're relatively new to Active Directory and don't have an idea on how AD replication works, read these links and come back:

Basic Concepts: http://technet.microsoft.com/en-us/library/cc731537%28v=ws.10%29.aspx
How it works:  http://technet.microsoft.com/en-us/library/cc772726(v=ws.10).aspx 

In case you did not read the links above, the basic problem is that when you have a replicated database changes take time to propagate.  AD Sites (fast connected subnets) have internal (intra-site, shorter) and external (inter-site, longer) replication periods.

Intra-site replication is somewhat predictable.  A DC will notify to its nearest partner of a change within 15 seconds and this will cascade within a site.  Older versions (like Windows 2000) were set at 5 minutes.

Unfortunately in larger global environments AD inter-site replication times vary. It all depends how the AD team has tuned the environment based on the inter-network topology.   This is why to be an effective Centrify administrator you need to be in constant communication with your AD team. Replication affects availability for users and definitely affects your SLAs.  The best solution at a higher level is to use Microsoft's recommendations for AD replication in large environments and that they maintain current Subnet, Site and Domain Controller information.  This is very important.


That being said, there are things that you can do to make sure things happen faster. 
For example, if there is a new add/move or change and the target is a key server in a specific location, you can log into the server and find out what domain controller the server is currently talking to with the adinfo command (or adinfo --server).  If you're making the provisioning via ADUC, adedit or Access Manager, make sure you're talking to the same DC.  At that point you basically have eliminated AD time from the equation and you can issue an adflush when these changes are made.


If you're using The Windows PowerShell Centrify commandlets, you can use the echo %logonserver% command in a windows prompt to find out which domain controller you're currently talking to.

Centrify Agent Cache

We've talked about the cache in previous posts;  however, all you need to know that to improve performance and to provide high-availability the Centrify agent for Unix, Linux and Mac does not bother AD persistently to ask for changes;  this happens by default every hour.

Putting it all together

Looking back at my example, this means that in an intra-site scenario with two domain controllers like in my lab contoso environment, the length of time for an effective for a user change can be as long as 65:30 minutes. Because it takes up to 15 seconds for replication to happen on the provisioning/role granting action, up to 15 minutes for ZPA to poll, 15 additional seconds for the ZPA change to propagate and up to 60 minutes for a Centrified system to update its cache.

How to perform manual add/moves/changes in an effective matter

  1. Determine the key system, and issue an adinfo command to determine the domain controller the agent is talking to.  (adinfo --server)
  2. With that information, connect your ADUC or Access Manager consoles to the target DC.  (on AM, use the "Connect to remote forest" option; on ADUC use the "Change Domain Controller" option).
  3. Perform your changes in AD (add/moves/changes)
  4. Perform an adflush in the target system  (if it's a local override, you need to restart the agent)
  5. Verify the changes with adquery, dzinfo, etc.
Note:  Flushing the cache (by interval or manual with adflush) is an expensive operation, I recommend that you keep the default cache flush interval of 3600 seconds (one hour) and try to establish a proper Service Level Agreement for these operations.

No comments:

Post a Comment