Some Things I Didn’t Know About Aruba ClearPass

So I just finished attending my second year of Aruba’s Atmosphere (a.k.a. “Airheads”) conference, and this time around my learning focus was on ClearPass.  In case you didn’t know, ClearPass is basically a glorified RADIUS authentication server. But it can also do SO MUCH MORE.

I have experience with ClearPass handing TACACS+ authentications for all our Cisco gear, and we use it for downloadable ACLs for our ASA firewalls. It’s pretty much the bee’s knees. I even had the opportunity to share a couple stories this week with Ten Talks, fashioned after Keith Parson’s use of TT’s at the annual Wireless LAN Professionals Conference (WLPC).

Thing #1 I didn’t know: There’s “hidden documentation” on the APIs built into the product. That’s right, you can go to https://clearpassIP/api-docs and see all sorts of lovely documentation on the APIs available in that particular version of ClearPass.  They first started with APIs for Guest, the element of CP used to handle guest registration and one-time time-limited access credentials and workflows. Apparently they’re also opening up APIs for the TIPS functionality of CP starting in April in v6.6. So basically anything you’d normally see or configure at https://ClearpassIP/tips will be available via a RESTful API. See below for some sample screenshots.

Screen Shot 2016 03 10 at 9 13 16 PM

 

Screen Shot 2016 03 10 at 9 13 58 PM

 

Screen Shot 2016 03 10 at 9 14 28 PM

Thing #2 I didn’t know: There’s apparently a “graphite” graphing utility available at https://clearpassIP/graphite that allows you to see how much data is being transferred between members of a ClearPass cluster. There’s a reference to it in Danny Jump’s Tech Note on Clustering Design Guidelines. Unfortunately when I tested on my system I get “Error 403 Forbidden.”

Thing #3 I didn’t know: Default ClearPass settings are NOT the same as recommended ClearPass settings. The Clustering Design Guidelines document mentioned above has some recommendations that I need to review and see if we need to make changes in our environment.

Thing #4 I didn’t know: Every ClearPass Policy Manager (CPPM) that uses Active Directory (AD) or other LDAP authentication should be joined to the domain via a domain controller that is LOCAL TO THE CPPM. This might explain why I experience timeouts with TACACS+ authentication every morning—because it’s having to traverse the WAN to complete the AD auth.

Thing #5 I didn’t know: Airwave can be added as an “Endpoint Context Server” to ClearPass. This enables a link to “Open in Airwave” when viewing a particular authentication in Access Tracker (see below). Clicking the link will open Airwave and, if the device is currently connected to an Airwave-managed device, will show you health status, connection path, etc. Pretty cool stuff.

Screen Shot 2016 03 10 at 9 18 27 PM

I took in a lot of info this week. Now I hope I can act on it back at the office before I start getting back into the grind!

Advertisements

Aruba ClearPass Virtual Lab Install

I recently spent a few hours installing a cluster of 3 Aruba ClearPass Policy Manager virtual appliances and, for future reference, decided to document the escapade here. If you can get something out of it too, all the better!

When I complete the configuration setup I’ll be posting more…stay tuned!

Getting Started

Download the OVS virtual appliance files from Aruba’s support site, and work with the virtualization team to get the new appliance(s) deployed to the proper location in your vSphere environment. The screenshots below are from vSphere 5.5.

Once the virtual appliances are deployed on the correct vlans/port groups, login to vCenter using the vSphere client and open the Virtual Machine Properties. When my VMs were deployed there was only 1 hard disk but it requires two. Add a second hard disk if it isn’t there already. Here I selected 100GB thin provisioned, but I believe the Aruba documentation may say to use Thick Provision Lazy Zeroed (I’m guessing for better performance later on).

After you’ve applied any necessary changes, open a console session in the vSphere client and power up the VM for the first time.

As it boots you’ll see a bunch of startup information fly by.

This is one of the only times you need to intervene in the install process. Hit the letter Y (or y) to verify you want to destroy all data on the second disk.

The installation process then begins to set up partitions.

I ended up seeing some errors along the way but as this is for a lab I’m not losing any sleep over it. Yet.

Loading plugins takes a while. If you don’t already have something to drink, lock your screen and walk away for a bit.

Hooray! All plugins loaded! Services starting up:

At long last, the CLI login screen!

Login with the ClearPass default CLI credentials “appadmin” and “eTIPS123”. Then we get to the configuration wizard. Extra points for you if you noticed that our VM apparently vMotioned since the last step.

We don’t use a separate Data Port in our setup, so I just hit ENTER to leave that field blank.

Next comes time and date configuration. You can use an NTP source or just set it manually. I used NTP.

We don’t use FIPS mode.

Configuration summary shows all the selections made during the wizard. Hit Y to continue.

The settings get applied, then services are restarted and you get the CLI login back:

That’s it for now…stay tuned for a continuation of this post to include more detailed setup.

Any pointers for me in setting up Virtual Clearpass for production? Please share with the rest of the folks! Questions? Hit me up in the comments or on Twitter (@swackhap).

Cisco Live Monday Lessons Learned

I attended great session today on Cisco’s Overlay Transport Virtualization (OTV) supported on Nexus 7k and ASR 1k platforms (BRKDCT-2049 – click here if you have a CiscoLive365 account). OTV is an L2 datacenter interconnect (DCI) technology proprietary to Cisco that is meant to help solve certain problems of traditional L2 VPNs including pseudo-wire maintenance and to better support multi-homing. In my enterprise role, it’s important to understand how we might be able to use this kind of tech for upcoming projects and be able to present supportable ideas to my partners in IT as well as the business we support.
 
Also on my schedule was Virtual Device Context (VDC) Design and Implementation Considerations with Nexus 7000 (BRKDCT-2121) by Ron Fuller (@ccie5851). I’ve had the good fortune of meeting with Ron in the past and continue to interact on Twitter, and he’s especially helpful in answering questions (sometimes almost in real-time). The material was in great detail and is important for me since I helped install and continue to support a Nexus 7k routed core. A key takeaway is that VDCs on the Nexus 7k are industry certified under FIPS 140-2, Common Criteria Evaluation and Validation Scheme Cert #10349. NSS Labs also has certified it as PCI compliant. The bottom line is that many customers can now collapse their Internet Edge, DMZ, and Core switching requirements into a single pair of N7Ks. There’s also support for FCoE to help converge storage and IP traffic in the datacenter.
 
Thanks to the power of Twitter (once again), I arranged a real-life meet-up with Phillip James (@security_freak) and Jake Snyder (@jsnyder81) to discuss 802.1x and NAC. Kellen Christensen (@ChrisTekIT) joined the discussion to learn from Phillip and Jake what it takes to implement 802.1x. It sounds like it’s much easier to do with wireless than with wired! The statistic “95% of wired 802.1x implementations fail” was thrown out, which certainly grabbed my attention. My key takeaways from this conversation, some based on my own (feeble) knowledge:
  1. Go slow. Start with Monitor Mode, then Low Impact Mode, then eventually work your way to High Security Mode.
  2. Be realistic and up-front with all critical players (desktop support, printer support, help desk, key users, management, etc). Partner with them and help them understand that this “may hurt a little” (my words).
  3. Cisco’s NAC appliance was replaced by Cisco Identity Service Engine (ISE) and supports RADIUS (basic as well as advanced functions defined in multiple RFCs). Cisco Secure ACS Server v5 is the current product that supports TACACS+. ISE doesn’t currently support TACACS+. 
  4. Aruba ClearPass supports RADIUS and TACACS+ as well as similar functions compared to ISE (security policy, endpoint identification/profiling). 
  5. I need to research what exact features are supported on the 3750/3750E/3750X access switches we’re looking to deploy this on as well as what exact features and RFCs are supported by ISE and ClearPass.
Another highlight of my day was meeting more Tweeps IRL (in real life) such as Matthew Norwood (@matthewnorwood). And many thanks to Amy Lewis (@commsninja) and her Cisco Datacenter team for hosting Waffle Club (ssh…the first rule about Waffle Club, is don’t talk about Waffle Club). Lots of great discussions there and I look forward to many more!
 

How I Stopped Worrying And Learned To Love IFTTT Automation

Screen Shot 2013 02 09 at 1 08 18 AMI’m a bigfan of Evernote to help with everything from blog ideas to date ideas and more, and I particularly like sending e-mails to my personal unique SMTP address.  There are some notes that have a lot in common, so I was looking for some way to automatically filter and file based on title.  I found out that you can just use @folder #tag1 #tag2 at the end of the e-mail subject line to automatically have Evernote put the note into the folder named “folder” and tag it with the specified tags.  

As I continued to look around for ideas how to streamline my workflow, I was reminded of a service I had heard of called “If This Then That” or “IFTTT.”  Now that I’ve spent an hour playing with the recipes I’m in LOVE! What an amazing service!  Facebook, Twitter, Blogger, WeMo, Instagram, Foursquare, and Google Reader are just some of the MANY “channels” you can log into from IFTTT. Then you create “recipes” by specifying cause and effect, or “this” and “that.”  

As an example, I set up the “phone” channel with my cell phone, and had it call me to read a typed message. It was a clunky computer voice, but it called me right on schedule and got me the message. I may start using that to help me wake up to workout in the morning.  The catch is that it can only be scheduled at 15 minute intervals (:00, :15, :30, :45).  Still, pretty impressive, right?

Don’t take my word for it. Go check it out for yourself! Set up your own free account at https://ifttt.com!

Oh, and for the heck of it, I’m going to have IFTTT tweet a message announcing this new blog post.  Let’s see if this works…

Unifying Wired and Wireless Edge with Aruba Tunneled Nodes

Anyone familiar with modern lightweight access points (APs) knows and understands the basics: Client connects to AP, AP tunnels traffic back to controller, and administrators can specify all sorts of useful policies in the controller.  Aruba Networks has taken this concept of the wireless edge and extended it to the wired edge of the network with their Tunneled Nodes and Mobility Access Switches. The company I work for has very old closet switches and, since we’re pretty heavily invested in Aruba wireless, I’m intrigued by the concept of unifying wired and wireless edges.

With a sample switch acquired from my account team, I spent a couple hours with my SE getting the basic introduction to Aruba’s Ethernet switches.  The goal of the session was to get the switch set up as a “wired AP” connected to a local controller, and when a laptop would connect to a particular port, the switch would then build a GRE tunnel to the local controller where the laptop’s traffic would get dumped out onto the specified VLAN.  Unfortunately, we weren’t able to complete the setup, so my SE and I agreed to engage the TAC for further assistance. 
My experience with the TAC was less than stellar this time around, but I believe it was mostly due to how new this technology is and that many TAC engineers haven’t had time to learn it inside and out yet.  Eventually I was able to reach an engineer that could identify a fix, and it turned out to be fairly simple. In addition, a high-level support supervisor called me personally to apologize and really listened to my recommendations for how to improve service.
Before the big reveal, here are the technical details of the setup.
We used a test laptop connected to port 2 of the Aruba switch, which was uplinked to a Cisco switch at my desk via an access-port on vlan 221.  That Cisco switch was connected through a trunked 802.1q LAN to the local controller. See the diagram for a topology overview.
When we first set things up, the tunneled-node (a.k.a. the laptop in this case) showed a state of “in-progress” (see output of “show tunneled-node state” command) and would never get to the “complete” state.
In problem state:
(ArubaS3500) #show tunneled-node state

Tunneled Node State
——————-
IP           MAC               Port    state       vlan tunnel inactive-time
—           —               —-    —–       —- —— ————-
10.20.20.125 00:1a:1e:10:fb:c0 GE0/0/1 in-progress 0221 4094   0000
Here are the most important parts of the configurations of the switch and controller below.
Switch:
ip-profile
   default-gateway 10.22.16.1
   controller-ip vlan 221

vlan “221”

interface-profile switching-profile “vlan221”
   access-vlan 221

interface-profile tunneled-node-profile “tunnel-local-controller”
   controller-ip 10.20.20.125
   backup-controller-ip 10.20.20.123

interface gigabitethernet “0/0/1”
   switching-profile “vlan221”

interface gigabitethernet “0/0/2”
   tunneled-node-profile “tunnel-local-controller”
   switching-profile “vlan221”

interface vlan “221”
   ip address 10.22.17.200 netmask 255.255.240.0
Local Controller:
vlan 220 “Backbone”
vlan 221 wired aaa-profile “s3500aaa”

interface vlan 220
        ip address 10.20.20.125 255.255.255.0

tunneled-node-address 10.20.20.125

aaa profile “s3500aaa”
   initial-role “authenticated”

aaa authentication wired
   profile “s3500aaa”
The core problem ended up being the “tunneled-node-address” command on the controller.  We had set it as the IP address of the controller itself, but the TAC identified this as the problem and changed it to all-zeros, like this:
tunneled-node-address 0.0.0.0
Finally, the tunneled-node came up in the “complete” state (see output below) and I was able to get a DHCP address on the laptop and connect to the rest of the network.
When problem was fixed:
(ArubaS3500) #show tunneled-node state

Tunneled Node State
——————-
IP           MAC               Port    state    vlan tunnel inactive-time
—           —               —-    —–    —- —— ————-
10.20.20.125 00:1a:1e:10:fb:c0 GE0/0/2 complete 0221 4094   0000
Hit me up on Twitter (@swackhap) or leave your feedback below.

Automating Exchange Bandwidth Limits with SolarWinds Orion NCM

As I’ve written about previously, one of the many tools I work with is SolarWinds Orion Network Configuration Manager (NCM). It’s a great tool to capture device configs on a daily basis, and for scheduling off-hours changes or regularly scheduled processes that may happen weekly, daily, or even multiple times per day.  
Recently our messaging team started replicating Microsoft Exchange data stores from our primary datacenter in the US to another location in the Far East (FE).  In this case, there’s only a 4.5 Mbps circuit connecting the locations, and the replication traffic started interfering with production traffic in the FE.  With QoS on the link and Riverbed Steelheads optimizing the traffic like nobody’s business, we still needed to so something. The decision was made to cap the Exchange Replication (henceforth referred to as EXREPL) traffic. 
Using the Steelhead’s Advanced QoS configuration we set the upper bandwidth (BW) % to 33% of the 4.5Mbps link (see below).
But we only needed to keep this limit in effect during the local daytime, and at other times we can let more EXREPL traffic through.  Despite the beautiful Web GUI that Riverbed uses, there’s also an excellent CLI interface. The question became “What commands can I use to modify the upper BW% for the EXREPL QoS class?” After a bit of reading through the CLI Guide, I found the proper format:
  • qos classification class modify class-name “EXREPL” upper-limit-pct 33
  • qos classification class modify class-name “EXREPL” upper-limit-pct 90

Rather than sit at the keyboard and execute these commands twice per day, I set up SolarWinds Orion NCM jobs on schedule to run the following:
M,T,W,Th,F at 7am CT
config t
qos classification class modify class-name “EXREPL” upper-limit-pct 33
end
write mem
Su,M,T,W,Th at 6pm CT
config t
qos classification class modify class-name “EXREPL” upper-limit-pct 90
end
write mem
Each scheduled job also fires off e-mail alerts to the network and messaging teams to keep everyone in the loop. For small teams like mine, this tool is invaluable in it’s flexibility. Now twice a day, like clockwork, NCM happily does it’s job and lets us know if it succeeded or had problems. Another crisis averted!

What kind of simple (or complex) automation do you use? Hit me up on Twitter (@swackhap) or post a comment below.

Network Disruption Causes vCenter DB Corruption

First off, I am NOT a VMware expert by any stretch of the imagination.  I AM however learning a lot working with some smart folks in virtualized servers and desktops.  
A network engineer (who shall remain nameless) was making some changes to the network infrastructure last night and unfortunately experienced an outage. Due to an ongoing network migration from Cat6500 to Nexus 7k/5k/2k, all ESX hosts are now connected to Nexus FEX but iSCSI storage is still on old Cat6500. Outage basically cut connectivity between Nexus-connected hosts and iSCSI storage. 
As users started trying to login to their desktops in the morning, we started getting reports of problems. Our VDI vCenter showed 4 of our 20+ hosts disconnected or not responding. We ended up power-cycling those, one at a time, and once they came up we were able to re-connect them back into vCenter.  
The next big problem was that the profile server, which runs as a VM in the VDI infrastructure, was hung while attempting to migrate. We rebooted vCenter which orphaned the profile server, but we found we were unable to browse the particular LUN where that VM’s datastore existed to add it back into vCenter. At that point, we engaged VMware support and spent several hours on WebEx troubleshooting storage connectivity problems (tail -f /var/log/vmkernel and some other stuff). By the time I left in the early afternoon we had identified half a dozen hosts that seemed to be having iSCSI problems based on what VMware Support was seeing in the logs, and we rebooted those hosts one at a time to minimize end-user impact.
I had to leave before the fun was all over, but found out afterwards that apparently a couple of the hosts got duplicates of the datastore IDs on them when they recovered from the outage overnight. Once that happened, the database was somehow corrupted with the wrong datastore information. It was apparently cleared by removing the two particular hosts from vCenter and adding them back in, thus giving them new datastore information.
Like I said, I’m not a VMware expert but I’m learning more each day. You ever experience something like this? Who else is doing VDI? Leave your comments below or find me on Twitter (@swackhap).