Some Things I Didn’t Know About Aruba ClearPass

So I just finished attending my second year of Aruba’s Atmosphere (a.k.a. “Airheads”) conference, and this time around my learning focus was on ClearPass.  In case you didn’t know, ClearPass is basically a glorified RADIUS authentication server. But it can also do SO MUCH MORE.

I have experience with ClearPass handing TACACS+ authentications for all our Cisco gear, and we use it for downloadable ACLs for our ASA firewalls. It’s pretty much the bee’s knees. I even had the opportunity to share a couple stories this week with Ten Talks, fashioned after Keith Parson’s use of TT’s at the annual Wireless LAN Professionals Conference (WLPC).

Thing #1 I didn’t know: There’s “hidden documentation” on the APIs built into the product. That’s right, you can go to https://clearpassIP/api-docs and see all sorts of lovely documentation on the APIs available in that particular version of ClearPass.  They first started with APIs for Guest, the element of CP used to handle guest registration and one-time time-limited access credentials and workflows. Apparently they’re also opening up APIs for the TIPS functionality of CP starting in April in v6.6. So basically anything you’d normally see or configure at https://ClearpassIP/tips will be available via a RESTful API. See below for some sample screenshots.

Screen Shot 2016 03 10 at 9 13 16 PM

 

Screen Shot 2016 03 10 at 9 13 58 PM

 

Screen Shot 2016 03 10 at 9 14 28 PM

Thing #2 I didn’t know: There’s apparently a “graphite” graphing utility available at https://clearpassIP/graphite that allows you to see how much data is being transferred between members of a ClearPass cluster. There’s a reference to it in Danny Jump’s Tech Note on Clustering Design Guidelines. Unfortunately when I tested on my system I get “Error 403 Forbidden.”

Thing #3 I didn’t know: Default ClearPass settings are NOT the same as recommended ClearPass settings. The Clustering Design Guidelines document mentioned above has some recommendations that I need to review and see if we need to make changes in our environment.

Thing #4 I didn’t know: Every ClearPass Policy Manager (CPPM) that uses Active Directory (AD) or other LDAP authentication should be joined to the domain via a domain controller that is LOCAL TO THE CPPM. This might explain why I experience timeouts with TACACS+ authentication every morning—because it’s having to traverse the WAN to complete the AD auth.

Thing #5 I didn’t know: Airwave can be added as an “Endpoint Context Server” to ClearPass. This enables a link to “Open in Airwave” when viewing a particular authentication in Access Tracker (see below). Clicking the link will open Airwave and, if the device is currently connected to an Airwave-managed device, will show you health status, connection path, etc. Pretty cool stuff.

Screen Shot 2016 03 10 at 9 18 27 PM

I took in a lot of info this week. Now I hope I can act on it back at the office before I start getting back into the grind!

Advertisement

Aruba ClearPass Virtual Lab Install

I recently spent a few hours installing a cluster of 3 Aruba ClearPass Policy Manager virtual appliances and, for future reference, decided to document the escapade here. If you can get something out of it too, all the better!

When I complete the configuration setup I’ll be posting more…stay tuned!

Getting Started

Download the OVS virtual appliance files from Aruba’s support site, and work with the virtualization team to get the new appliance(s) deployed to the proper location in your vSphere environment. The screenshots below are from vSphere 5.5.

Once the virtual appliances are deployed on the correct vlans/port groups, login to vCenter using the vSphere client and open the Virtual Machine Properties. When my VMs were deployed there was only 1 hard disk but it requires two. Add a second hard disk if it isn’t there already. Here I selected 100GB thin provisioned, but I believe the Aruba documentation may say to use Thick Provision Lazy Zeroed (I’m guessing for better performance later on).

After you’ve applied any necessary changes, open a console session in the vSphere client and power up the VM for the first time.

As it boots you’ll see a bunch of startup information fly by.

This is one of the only times you need to intervene in the install process. Hit the letter Y (or y) to verify you want to destroy all data on the second disk.

The installation process then begins to set up partitions.

I ended up seeing some errors along the way but as this is for a lab I’m not losing any sleep over it. Yet.

Loading plugins takes a while. If you don’t already have something to drink, lock your screen and walk away for a bit.

Hooray! All plugins loaded! Services starting up:

At long last, the CLI login screen!

Login with the ClearPass default CLI credentials “appadmin” and “eTIPS123”. Then we get to the configuration wizard. Extra points for you if you noticed that our VM apparently vMotioned since the last step.

We don’t use a separate Data Port in our setup, so I just hit ENTER to leave that field blank.

Next comes time and date configuration. You can use an NTP source or just set it manually. I used NTP.

We don’t use FIPS mode.

Configuration summary shows all the selections made during the wizard. Hit Y to continue.

The settings get applied, then services are restarted and you get the CLI login back:

That’s it for now…stay tuned for a continuation of this post to include more detailed setup.

Any pointers for me in setting up Virtual Clearpass for production? Please share with the rest of the folks! Questions? Hit me up in the comments or on Twitter (@swackhap).

Cisco Live Monday Lessons Learned

I attended great session today on Cisco’s Overlay Transport Virtualization (OTV) supported on Nexus 7k and ASR 1k platforms (BRKDCT-2049 – click here if you have a CiscoLive365 account). OTV is an L2 datacenter interconnect (DCI) technology proprietary to Cisco that is meant to help solve certain problems of traditional L2 VPNs including pseudo-wire maintenance and to better support multi-homing. In my enterprise role, it’s important to understand how we might be able to use this kind of tech for upcoming projects and be able to present supportable ideas to my partners in IT as well as the business we support.
 
Also on my schedule was Virtual Device Context (VDC) Design and Implementation Considerations with Nexus 7000 (BRKDCT-2121) by Ron Fuller (@ccie5851). I’ve had the good fortune of meeting with Ron in the past and continue to interact on Twitter, and he’s especially helpful in answering questions (sometimes almost in real-time). The material was in great detail and is important for me since I helped install and continue to support a Nexus 7k routed core. A key takeaway is that VDCs on the Nexus 7k are industry certified under FIPS 140-2, Common Criteria Evaluation and Validation Scheme Cert #10349. NSS Labs also has certified it as PCI compliant. The bottom line is that many customers can now collapse their Internet Edge, DMZ, and Core switching requirements into a single pair of N7Ks. There’s also support for FCoE to help converge storage and IP traffic in the datacenter.
 
Thanks to the power of Twitter (once again), I arranged a real-life meet-up with Phillip James (@security_freak) and Jake Snyder (@jsnyder81) to discuss 802.1x and NAC. Kellen Christensen (@ChrisTekIT) joined the discussion to learn from Phillip and Jake what it takes to implement 802.1x. It sounds like it’s much easier to do with wireless than with wired! The statistic “95% of wired 802.1x implementations fail” was thrown out, which certainly grabbed my attention. My key takeaways from this conversation, some based on my own (feeble) knowledge:
  1. Go slow. Start with Monitor Mode, then Low Impact Mode, then eventually work your way to High Security Mode.
  2. Be realistic and up-front with all critical players (desktop support, printer support, help desk, key users, management, etc). Partner with them and help them understand that this “may hurt a little” (my words).
  3. Cisco’s NAC appliance was replaced by Cisco Identity Service Engine (ISE) and supports RADIUS (basic as well as advanced functions defined in multiple RFCs). Cisco Secure ACS Server v5 is the current product that supports TACACS+. ISE doesn’t currently support TACACS+. 
  4. Aruba ClearPass supports RADIUS and TACACS+ as well as similar functions compared to ISE (security policy, endpoint identification/profiling). 
  5. I need to research what exact features are supported on the 3750/3750E/3750X access switches we’re looking to deploy this on as well as what exact features and RFCs are supported by ISE and ClearPass.
Another highlight of my day was meeting more Tweeps IRL (in real life) such as Matthew Norwood (@matthewnorwood). And many thanks to Amy Lewis (@commsninja) and her Cisco Datacenter team for hosting Waffle Club (ssh…the first rule about Waffle Club, is don’t talk about Waffle Club). Lots of great discussions there and I look forward to many more!
 

How I Stopped Worrying And Learned To Love IFTTT Automation

Screen Shot 2013 02 09 at 1 08 18 AMI’m a bigfan of Evernote to help with everything from blog ideas to date ideas and more, and I particularly like sending e-mails to my personal unique SMTP address.  There are some notes that have a lot in common, so I was looking for some way to automatically filter and file based on title.  I found out that you can just use @folder #tag1 #tag2 at the end of the e-mail subject line to automatically have Evernote put the note into the folder named “folder” and tag it with the specified tags.  

As I continued to look around for ideas how to streamline my workflow, I was reminded of a service I had heard of called “If This Then That” or “IFTTT.”  Now that I’ve spent an hour playing with the recipes I’m in LOVE! What an amazing service!  Facebook, Twitter, Blogger, WeMo, Instagram, Foursquare, and Google Reader are just some of the MANY “channels” you can log into from IFTTT. Then you create “recipes” by specifying cause and effect, or “this” and “that.”  

As an example, I set up the “phone” channel with my cell phone, and had it call me to read a typed message. It was a clunky computer voice, but it called me right on schedule and got me the message. I may start using that to help me wake up to workout in the morning.  The catch is that it can only be scheduled at 15 minute intervals (:00, :15, :30, :45).  Still, pretty impressive, right?

Don’t take my word for it. Go check it out for yourself! Set up your own free account at https://ifttt.com!

Oh, and for the heck of it, I’m going to have IFTTT tweet a message announcing this new blog post.  Let’s see if this works…

Unifying Wired and Wireless Edge with Aruba Tunneled Nodes

Anyone familiar with modern lightweight access points (APs) knows and understands the basics: Client connects to AP, AP tunnels traffic back to controller, and administrators can specify all sorts of useful policies in the controller.  Aruba Networks has taken this concept of the wireless edge and extended it to the wired edge of the network with their Tunneled Nodes and Mobility Access Switches. The company I work for has very old closet switches and, since we’re pretty heavily invested in Aruba wireless, I’m intrigued by the concept of unifying wired and wireless edges.

With a sample switch acquired from my account team, I spent a couple hours with my SE getting the basic introduction to Aruba’s Ethernet switches.  The goal of the session was to get the switch set up as a “wired AP” connected to a local controller, and when a laptop would connect to a particular port, the switch would then build a GRE tunnel to the local controller where the laptop’s traffic would get dumped out onto the specified VLAN.  Unfortunately, we weren’t able to complete the setup, so my SE and I agreed to engage the TAC for further assistance. 
My experience with the TAC was less than stellar this time around, but I believe it was mostly due to how new this technology is and that many TAC engineers haven’t had time to learn it inside and out yet.  Eventually I was able to reach an engineer that could identify a fix, and it turned out to be fairly simple. In addition, a high-level support supervisor called me personally to apologize and really listened to my recommendations for how to improve service.
Before the big reveal, here are the technical details of the setup.
We used a test laptop connected to port 2 of the Aruba switch, which was uplinked to a Cisco switch at my desk via an access-port on vlan 221.  That Cisco switch was connected through a trunked 802.1q LAN to the local controller. See the diagram for a topology overview.
When we first set things up, the tunneled-node (a.k.a. the laptop in this case) showed a state of “in-progress” (see output of “show tunneled-node state” command) and would never get to the “complete” state.
In problem state:
(ArubaS3500) #show tunneled-node state

Tunneled Node State
——————-
IP           MAC               Port    state       vlan tunnel inactive-time
—           —               —-    —–       —- —— ————-
10.20.20.125 00:1a:1e:10:fb:c0 GE0/0/1 in-progress 0221 4094   0000
Here are the most important parts of the configurations of the switch and controller below.
Switch:
ip-profile
   default-gateway 10.22.16.1
   controller-ip vlan 221

vlan “221”

interface-profile switching-profile “vlan221”
   access-vlan 221

interface-profile tunneled-node-profile “tunnel-local-controller”
   controller-ip 10.20.20.125
   backup-controller-ip 10.20.20.123

interface gigabitethernet “0/0/1”
   switching-profile “vlan221”

interface gigabitethernet “0/0/2”
   tunneled-node-profile “tunnel-local-controller”
   switching-profile “vlan221”

interface vlan “221”
   ip address 10.22.17.200 netmask 255.255.240.0
Local Controller:
vlan 220 “Backbone”
vlan 221 wired aaa-profile “s3500aaa”

interface vlan 220
        ip address 10.20.20.125 255.255.255.0

tunneled-node-address 10.20.20.125

aaa profile “s3500aaa”
   initial-role “authenticated”

aaa authentication wired
   profile “s3500aaa”
The core problem ended up being the “tunneled-node-address” command on the controller.  We had set it as the IP address of the controller itself, but the TAC identified this as the problem and changed it to all-zeros, like this:
tunneled-node-address 0.0.0.0
Finally, the tunneled-node came up in the “complete” state (see output below) and I was able to get a DHCP address on the laptop and connect to the rest of the network.
When problem was fixed:
(ArubaS3500) #show tunneled-node state

Tunneled Node State
——————-
IP           MAC               Port    state    vlan tunnel inactive-time
—           —               —-    —–    —- —— ————-
10.20.20.125 00:1a:1e:10:fb:c0 GE0/0/2 complete 0221 4094   0000
Hit me up on Twitter (@swackhap) or leave your feedback below.

Automating Exchange Bandwidth Limits with SolarWinds Orion NCM

As I’ve written about previously, one of the many tools I work with is SolarWinds Orion Network Configuration Manager (NCM). It’s a great tool to capture device configs on a daily basis, and for scheduling off-hours changes or regularly scheduled processes that may happen weekly, daily, or even multiple times per day.  
Recently our messaging team started replicating Microsoft Exchange data stores from our primary datacenter in the US to another location in the Far East (FE).  In this case, there’s only a 4.5 Mbps circuit connecting the locations, and the replication traffic started interfering with production traffic in the FE.  With QoS on the link and Riverbed Steelheads optimizing the traffic like nobody’s business, we still needed to so something. The decision was made to cap the Exchange Replication (henceforth referred to as EXREPL) traffic. 
Using the Steelhead’s Advanced QoS configuration we set the upper bandwidth (BW) % to 33% of the 4.5Mbps link (see below).
But we only needed to keep this limit in effect during the local daytime, and at other times we can let more EXREPL traffic through.  Despite the beautiful Web GUI that Riverbed uses, there’s also an excellent CLI interface. The question became “What commands can I use to modify the upper BW% for the EXREPL QoS class?” After a bit of reading through the CLI Guide, I found the proper format:
  • qos classification class modify class-name “EXREPL” upper-limit-pct 33
  • qos classification class modify class-name “EXREPL” upper-limit-pct 90

Rather than sit at the keyboard and execute these commands twice per day, I set up SolarWinds Orion NCM jobs on schedule to run the following:
M,T,W,Th,F at 7am CT
config t
qos classification class modify class-name “EXREPL” upper-limit-pct 33
end
write mem
Su,M,T,W,Th at 6pm CT
config t
qos classification class modify class-name “EXREPL” upper-limit-pct 90
end
write mem
Each scheduled job also fires off e-mail alerts to the network and messaging teams to keep everyone in the loop. For small teams like mine, this tool is invaluable in it’s flexibility. Now twice a day, like clockwork, NCM happily does it’s job and lets us know if it succeeded or had problems. Another crisis averted!

What kind of simple (or complex) automation do you use? Hit me up on Twitter (@swackhap) or post a comment below.

Network Disruption Causes vCenter DB Corruption

First off, I am NOT a VMware expert by any stretch of the imagination.  I AM however learning a lot working with some smart folks in virtualized servers and desktops.  
A network engineer (who shall remain nameless) was making some changes to the network infrastructure last night and unfortunately experienced an outage. Due to an ongoing network migration from Cat6500 to Nexus 7k/5k/2k, all ESX hosts are now connected to Nexus FEX but iSCSI storage is still on old Cat6500. Outage basically cut connectivity between Nexus-connected hosts and iSCSI storage. 
As users started trying to login to their desktops in the morning, we started getting reports of problems. Our VDI vCenter showed 4 of our 20+ hosts disconnected or not responding. We ended up power-cycling those, one at a time, and once they came up we were able to re-connect them back into vCenter.  
The next big problem was that the profile server, which runs as a VM in the VDI infrastructure, was hung while attempting to migrate. We rebooted vCenter which orphaned the profile server, but we found we were unable to browse the particular LUN where that VM’s datastore existed to add it back into vCenter. At that point, we engaged VMware support and spent several hours on WebEx troubleshooting storage connectivity problems (tail -f /var/log/vmkernel and some other stuff). By the time I left in the early afternoon we had identified half a dozen hosts that seemed to be having iSCSI problems based on what VMware Support was seeing in the logs, and we rebooted those hosts one at a time to minimize end-user impact.
I had to leave before the fun was all over, but found out afterwards that apparently a couple of the hosts got duplicates of the datastore IDs on them when they recovered from the outage overnight. Once that happened, the database was somehow corrupted with the wrong datastore information. It was apparently cleared by removing the two particular hosts from vCenter and adding them back in, thus giving them new datastore information.
Like I said, I’m not a VMware expert but I’m learning more each day. You ever experience something like this? Who else is doing VDI? Leave your comments below or find me on Twitter (@swackhap).

Cisco Live Tips and Tricks

Hard to believe it’s been over a year since my last post here.  As I’ve learned in life though, sometimes you have to forgive yourself for your failings (in this case, not blogging for a while) and then you can continue to improve on yourself.

I recently attended Cisco Live 2012 in San Diego. After attending 9 times (thereabouts), I figured I’d share some ideas/thoughts/tips.

First off, have a 10-foot extension cord when traveling and when attending sessions.  Many breakout sessions and labs are in rooms that have power strips available, but some do not. If your extension cord has a 3-prong plug, have a 3-prong to 2-prong adapter with you just in case you need to plug into an old outlet.

The World of Solutions (WoS) is the area where Cisco and their partners set up booths with all sorts of goodies.  The first night it may be okay to wander a bit, but at some point you need to HAVE A PLAN. Look over the list of exhibitors. Think about your goals for the conference. Are there particular problems at work that you’re trying to solve?  The WoS is THE PLACE to find the solution.  Print a map of the booths and circle the ones you want to visit. Then cross them off after you’ve been there.  Stay focused!

Some of my favorite places in the World of Solutions:

  • Walk-In Hands-On Labs – Great place to spend a few minutes learning new skills and practicing configurations on a plethora of systems.
  • Cisco Booth – Incredible opportunity to learn about almost every product/system/solution that they sell.
  • Social Media Hub – For the first time this year, the folks behind all the social networking for the event, such as the @CiscoLive Twitter account, set up shop to show off the top Tweeters and give people a place to lounge for a bit.
  • Technical Solutions Clinic – Basically an engineer’s Heaven-on-Earth, there are several dozen whiteboards surrounded by some of Cisco’s smartest Technical Marketing Engineers and TAC folks. What problem did you have at work you’ve been trying to fix? They’ll solve it for you.
The Cisco Live mobile app makes navigating the conference a snap. View your schedule of sessions, browse WoS exhibitor listings and conference maps, and complete evaluations of sessions you’ve attended, right on your phone or tablet.  The evaluations are incredibly important and Cisco takes them very seriously.
I’m very excited to have attended Cisco Live once again, and hope to continue doing so.  I consider a week at Cisco Live equivalent to about 3 weeks worth of training.
If you have any questions, comment below or hit me up on Twitter (@swackhap). Cheers!

Aruba Mobility Bootcamp Experience and Random Cisco Wireless Comparisons

This post is about wireless technology. However, I’m not a wireless expert. I’ve worked with Cisco Wireless LAN Controllers (WLCs) for a few years and have been quite happy with them.  That said, I’ve seen Aruba’s prices and they’re very competitive. I have the opportunity to work with both Aruba and Cisco now in my current position.

I attended the Aruba Mobility Bootcamp (MBC), a week-long class including Powerpoint instruction as well as hands-on labs with Aruba model 3200 controllers, AP125s, and RAP2s. The class was very well taught by an experienced Aruba instructor (Ken Elwell). The material was well designed and Ken did a great job boiling down some of the more complicated slides saying things like “This is an overly complex slide that really is just trying to tell you X.”

Topics covered included the following, and there was a hands-on lab for each one. I’ve included my own interpretation for most of them.
  • Architecture
  • Initial Controller Setup
  • AP Provisioning – AP’s come online using Aruba Discovery Protocol, which uses things like DHCP option 43 and/or looking for DNS “aruba-master” record; AP’s come up in default AP group, then are provisioned to desired group, assigned a useful name, and rebooted for changes to take effect
  • Authentication – MAC-based, Captive Portal, 802.1x with different EAP types
  • Firewall Policies – Aruba controller can be licensed with additional stateful firewall with policies that can be applied to individual devices and users (They also mention it’s ICSA certified)
  • Roles – Every device and user has a role associated with it; there are different methods how these roles can be derived, such as through MAC address, 802.1x authentication, Captive Portal login credentials, as well as the actual SSID the user is associated with
  • RF Plan – Decent application available on the controller as well as standalone for Windows that allows import of floor plans and automatic placement of APs on map; can then print a bill of materials for order placement (I’m sure that’s a favorite feature of Aruba SEs 🙂 )
  • Adaptive Radio Management (ARM) – automatic detection of channel-based WiFi interference and automatic channel and power-level changes to maximize coverage
  • Captive Portal Operations – web-based authentication for guest networks
  • Remote Access Points (RAPs) – useful for SOHO, can tunnel all traffic and/or do split-tunnel for employee SSID; can also provide additional SSID for non-employee Internet access for personal/family use
  • Remote AP Installation with ZeroTouch Deployment – administrator adds a RAP’s MAC address to a “white list”, then user takes RAP home, plugs it in, enters basic info allowing it to “phone home” to the controller and get it’s config policies
  • Virtual Intranet Access (VIA) – remote-access client for PCs running Windows 32-bit; future support for Win 64-bit and Mac
  • Wired Access Control – apply security policies used for wireless users to wired ports on APs; particular useful for SOHO running a RAP with additional Ethernet ports
  • Site-To-Site VPN – Compatible with other Aruba controllers as well as Netscreen, Sonicwall, Microsoft, and Cisco
  • Master Redundancy – VRRP active/standby redundancy
  • Master and Local Operation – AP’s can be associated to a controller on-prem (“Local”) and failover to Master (back at datacenter) in case of Local controller failure
  • Local Redundancy – VRRP active/standby, N+1 failover where one controller backs up multiple conrollers as VRRP standby for those other controllers, active/active redundancy where each controller in a pair is active VRRP for different VRRP groups
  • Mobility – Keep same IP even while roaming between different controllers, useful for dense deployments on large campuses
  • Mesh – Outdoor or indoor
  • Wireless Intrusion Protection (WIP)
One of the most critical things I learned this week is the level of abstraction involved with configuring Aruba Mobility Controllers.  In order to configure something as simple as a set of access points with multiple SSIDs (e.g., employee and guest), you actually create two different “Virtual APs” or VAPs. Then you associate those two VAPs with an “AP Group”. Then you provision particular APs to that group.  It’s a little challenging to get used to after working with Cisco for so long, but it’s a very powerful way of configuring the controller. The concept of object-oriented programming comes to mind.

Keeping in mind that I am NOT A WIRELESS EXPERT, here are some of my thoughts on Aruba vs Cisco:

Random Comparisons between Aruba and Cisco (Swack’s $0.02):
Aruba ARM vs Cisco CleanAir – Aruba’s current ARM technology appears to be limited to seeing channel-based interference, whereas Cisco CleanAir incorporates a special chip designed to see the entire RF environment including interference not caused by 802.11 sources (think microwave ovens, analog jammers, radar, etc.).  CleanAir is more expensive, but is much more advanced. Depends how critical your wireless environment is and how much you’re willing to pay for the added functionality.

Aruba RAP vs Cisco OfficeExtend – Aruba’s RAP2 provides 802.11b/g and retails for $99. Cisco OfficeExtend uses 1140 or 1130AG APs which I think are more than $99 (correct me if I’m wrong). These costs don’t take into effect the licensing you’ll need on the controllers.

Aruba Policy Enforcement Firewall (PEF) vs Cisco SSID-based ACLs – Stateful firewall policies based on user and/or device vs. non-stateful ACLs.

Aruba RF Plan software vs Cisco WCS Planning Tool – Aruba’s RF Plan software is available on their controllers as well as through a Windows-based executable. We got it for free from our Aruba SE. Cisco WCS is not cheap, and I’m not aware of another source for the planning software.

Swack’s Take:
I learned a ton this week that I can apply at my current job. Also, thanks to some folks I interact with on Twitter, I was able to learn more about Cisco’s wireless solutions.  In the end, it’s up to the individual engineer at a particular company to decide what is best for their environment.

Please comment below or hit me up on Twitter (@swackhap) with your comments/questions/snarky remarks about the competition.

Scripting On-Demand Network Changes with Solarwinds Orion NCM

Getting called at 2am is never fun, even if you are the Network On-Call person.  Any chance I can  prevent a call like that, I’ll take it! In this case, there’s a “failover pair” of servers, one in each data center (DC). Each server has a locally unique admin/replication IP addresses on one interface that is always active and a second interface that shares the same IP address as the server in the other DC. Whichever server is active enables the  highly-available (HA) interface while the other server’s HA interface is disabled. We can then make network changes to routers and switches to “switch” the server from one DC to the other. And instead of my having to manually make those changes at 2am, we can script the changes with a configuration management tool. Our tool of choice is Solarwinds Orion Network Configuration Manager (NCM).

In this particular use of NCM, there are 5 individual NCM jobs, one for each device that must be touched. The changes include enabling/disabling switch ports and adding/removing route advertisements in EIGRP and BGP.  Assume the names of the 5 jobs are AutoJob1a, AutoJob2a, …, AutoJob5a. In addition, there are 5 jobs for the reverse direction named AutoJob1b, AutoJob2b, …, AutoJob5b.  Each of these jobs has an NCM Job ID associated with it seen under the “Job ID” column when viewing Scheduled Jobs from the NCM GUI.

At this point, we’ve saved ourselves from having to individually login to each of the devices to make the required changes. But we can take it a step further by combining all the jobs and launching them from a Windows Batch (.bat) file.  On the NCM server we created the file d:\RemoteJobs\AutoJob-A.bat which contains these 5 lines, one per NCM job:

“D:\Program Files\SolarWinds\Configuration Management\configmgmtjob.exe” “D:\Program Files\SolarWinds\Configuration Management\Jobs\Job-318696.ConfigMgmtJob”
“D:\Program Files\SolarWinds\Configuration Management\configmgmtjob.exe” “D:\Program Files\SolarWinds\Configuration Management\Jobs\Job-631858.ConfigMgmtJob”
“D:\Program Files\SolarWinds\Configuration Management\configmgmtjob.exe” “D:\Program Files\SolarWinds\Configuration Management\Jobs\Job-713828.ConfigMgmtJob”
“D:\Program Files\SolarWinds\Configuration Management\configmgmtjob.exe” “D:\Program Files\SolarWinds\Configuration Management\Jobs\Job-272305.ConfigMgmtJob”
“D:\Program Files\SolarWinds\Configuration Management\configmgmtjob.exe” “D:\Program Files\SolarWinds\Configuration Management\Jobs\Job-777458.ConfigMgmtJob”

Note that the Job ID for each job shows up in the name of the .ConfigMgmtJob file that is called in each line of the .bat file.
At this point, any monkey with a login to the NCM server could just double-click on the .bat file to kick off those five NCM jobs.  But there’s a better way, at least in our environment: Tidal Scheduler.  With a Tidal agent on the NCM server, Tidal can be configured to launch d:\RemoteJobs\AutoJob-A.bat or the reverse d:\RemoteJobs\AutoJob-B.bat on-demand by the Operator-On-Duty.  This allows the event to be properly audited and standardizes the action required by the Operator.  
In addition, we can configure each NCM job such that it generates an e-mail notification when it completes, so when all 5 have completed we get 5 e-mails that show exactly what commands were entered and the corresponding output from the router/switch that was modified. The e-mail can be sent to the Network team as well as the Operations team so they have a better understanding of success than simply a “completed job” message from Tidal.
In the end, instead of getting a wake-up call at 2am, the server admin team can now simply call the Operator-On-Duty and ask them to run NCM Job “AutoJob-A” or “AutoJob-B”. They then use a simple traceroute to determine if the network “thinks” the server is in DC A or DC B.
Ahh, now I can go back to sleep. 
Zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz.