Sunday, May 18, 2014

Labs: Testing the Availability Controls of Centrify for UNIX/Linux/Mac



In this Lab:
  1. We'll use the dns.block parameter to use the domain controller fail-over capability.
  2. We'll simulate a network failure on UBU1 and use the offline cache.
  3. We'll attempt to kill the adclient process as a regular user
  4. We will corrupt the /etc/nsswitch.conf and krb5.conf files 
  5. We'll simulate an abnormal ending of the client process to invoke the watchdog process
  6. We'll describe the domain controller telemetry process

Friday, May 16, 2014

Security Corner: Centrify UNIX Agent's Availability controls

Availability is the security principle that states that information should be available when needed and authentication mechanisms are required to be highly available since they are the door to the information that may be needed to make a business decision.

The Centrify agent for UNIX has a great advantage since it enables those platforms to integrate to Active Directory.  Active Directory was created with high-availability in mind and when properly implemented(*) provides:

  • Replication:  Changes in the AD database are replicated to other domain controllers.  This process is called convergence.  Replication applies to LDAP objects and files.
  • Multi-master:  Unlike NTDS 4.0 that relied on a Primary Domain Controller (PDC) role for write operations, in AD all DCs are writable.  There are some exceptions in what are called Flexible Domain Master Operations roles.  
  • Sites and Services and DNS SRV records:  AD leverages DNS to provide the closest-to-client services;  any services that rely on these capabilities will be able to access the best connected service based on network location.
(*) Sadly, on the field (especially on lab environments) we see a lot of single DC environments and improper configuration of AD Sites and Services.  When we see the infamous "Default-First-Site-Name" displayed it decreases the credibility of the environment's maintainer.

One of the biggest concerns for any UNIX/Linux Systems Administrator is not having the ability to do their job because authentication scheme is not available;  as a matter of fact, that is why many of them are biased on using shared accounts (like root), because those accounts are reliably available.  Unfortunately this perpetuates the poor security practice of sharing those sensitive accounts.  The benefit with Centrify is that the mitigation for HA happens automatically without the need to re-target LDAP services or reconfigure krb5.conf files.

How does Centrify mitigate the Availability question?

Any true risk is mitigated by preventative, detective and corrective controls.  The controls deployed by Centrify are:
  1. AD Sites and Services compatibility:  this means that the agent will pick an alternative domain controller based on the AD site topology.  (Corrective/Preventative)
  2.  Performance Optimizations:  the agent performs its own telemetry calculations to determine if it's talking to the most optimal domain controller. (Preventative)
  3. No AD available:  In case of a network-level failure (inability to connect to any DC) the agent provides the offline credential cache. (Corrective)
  4. Abnormal termination:  the cdcwatch process is a watchdog that will spawn a new agent process in case of an abnormal termination.  (Corrective)
  5. System file corruption:  Any changes in name server switch (NSS) (nsswitch.conf), Kerberos (krb5.conf) or pluggable authentication module (PAM) config files is monitored and rolled-back to proper operational mode if needed. (Corrective/Preventative)
  6. Process Protections:  All agent-related processes (and the watchdog) are owned by root. (Preventative)
  7. Logs and Core Dumps:  Centrify integrates with the syslog facility and provides its own core dumps in case of an abnormal termination. (Detective)
As you can see, Centrify's agent implements a high-level of controls to ensure high-availability.

When will a privileged system or local account be needed?

In two instances:
a) Normal termination of the Centrify agent process:  The agent has been stopped, therefore there's no communication with AD or the cache.  The authentication stack will continue on.  For example, on your /etc/nsswitch.conf you may see a line for users (or groups like this):

passwd         centrifydc   files

In case the adclient process is not available, any local account will be granted access.  This is the same under normal operations.

b) Single-user mode:  If the system abends and falls into that mode, the only account that can access the system is root.

Command Line Tool tips

The best command to troubleshoot the agent is adinfo.  Use it with the --test option perform connectivity tests.  Sample output:

george@suse1:~> adinfo --test

Domain Diagnostics
  Domain: corp.contoso.com
  Subnet site: CorpHQ
    DNS query for: _ldap._tcp.corp.contoso.com
    Found SRV records:
      dc1.corp.contoso.com:389
  Testing Active Directory connectivity:
    Domain Controller: dc1.corp.contoso.com
      ldap:      389/tcp - good
      ldap:      389/udp - good
      smb:       445/tcp - good
      kdc:        88/tcp - good
      kpasswd:   464/tcp - good
      ntp:       123/udp - good

Tips about conducting Disaster Recovery tests

Disaster recovery with Centrify for Servers can piggyback on the AD infrastructure and work performed for Windows domain members.  Unix/Linux systems just become another "customer of AD" this means:
  • If you're performing a total (from scratch) AD recovery, AD DCs and DNS go first, then the Zone data has to be rebuilt.  UNIX/Linux systems rebuild happens in parallel, once ready, load the agent and join the zone; at that point instead of using the root account, you can switch to dzdo.
  • If you're performing a partial or restore of existing systems, restore the AD infrastructure first;  you can work in parallel with UNIX/Linux systems  (credentials will be cached up to the moment of the backup or snapshot) and access will be offline - once AD is online the agent will go in connected mode.
  • If you're testing with a disaster recovery site that goes online during tests your strategy may vary. In some outfits its not desirable that the agent fails over to the DR site; so using the dns.block parameter (or the Blacklist DNS DC hostnames GPO) to have those DCs blocked during production is desirable.  However, during the disaster, it should be scripted (or automated) that the DR Site DCs will be unblocked so they are eligible for fail-over.  Another variation of this test is to block the production DCs and only allow the DR DC (that is a crude/forced DR test).
  • In larger environments, it's quite common that not all trusted sysadmins have had logged on to all target systems, that is why Centrify has the ability to pre-validate (or pre-cache) user credentials.  Prevalidation will be the subject of another posting.

Description of the dns.block parameter

This configuration parameter specifies which DCs should be filtered out of the pool of existing DCs for the domain.  This is useful when a DC is behind a firewall, has been decommissioned but there's a stalled object or for DR tests.

The parameter in the /etc/centrifydc/centrifydc.conf file works by specifying the directive followed by the FQDNs of the DCs in question separated by commas.

dns.block: dc1.corp.contoso.com,exp1.corp.contoso.com

If you prefer to use GPOs, the path is:  Computer Configuration > Policies > Centrify Settings > DirectControl Settings > Network and Cache Settings > Blacklist DNS DC hostnames group policy.

Labs