Life Without Caffeine

Title catch your attention? I thought so. Try to imagine it for a minute. I’ve been living it. Well, not quite, but I’ve been living on a limited caffeine intake since January 1 2011 as part of my 100-day challenge. Before Jan 1, I would drink 6-8 Diet Cokes per day. Since then, just 1. What’s worth giving up so much caffeine for, you ask? This:

The Apple iPad. Better yet, by the time my 100-day challenge is done on April 10 (but who’s counting), there should be an iPad 2 available.

In this world of self-indulgence it’s very rewarding, albeit quite challenging, to replace instant self-gratification with self-denial. It makes the prize that much sweeter in the end.

What will you give up for 100 days, and what will your reward be? Join me in the challenge!

Splunk Field Extraction and Report for Cisco AnyConnect VPN Failures

At the peak of Snowmageddon and Icemageddon this week our remote-access VPN resources were getting some major exercise.  Our office was even closed for a day, something that doesn’t happen often.  Our 100 simultaneous AnyConnect SSL VPN licenses on our Cisco ASA were being used up by 9am 3 days in a row, preventing many people from getting connected.  I’ve mentioned in a previous post about our secondary process, where we have users download and install the IPSEC VPN client. But for those that know the products, that’s not as convenient as AnyConnect.


After the fact I was discussing options for increasing our remote access VPN capacity, all of which require money.  To justify the cost to the money holders, it’s always useful to have data to back you up.  So we started asking questions:
  • How many people had problems connecting to the VPN?
  • How many times were individual users failing to connect due to our license limit?

After some digging I was able to find the perfect ASA log entry:


     %ASA-4-716023: Group name User user Session could not be established: session limit of maximum_sessions reached.

In our case it looks more like this:

     %ASA-4-716023: Group  User <swackhap>  IP <24.107.10.23> Session could not be established: session limit of 100 reached.

With our Splunk log analysis tool we were able to dig even deeper to analyze the data and get some good statistics to justify our request for added VPN capacity. Within Splunk, I first ran a search for the above log entry:
So in this case you can see we had 1071 occurrences of that log entry.  But how many people were affected? Splunk normally does a great job extracting fields of data it considers to be useful. But in our case we want to extract the actual userIDs, such as ea900503 and nbf shown above, and Splunk hasn’t done it for us.

To extract a new field in Splunk, simply click on the small gray box with the downward facing triangle to the left of the event, then select “Extract Fields” as shown below.

In the “Example values” box I typed the two sample userIDs and clicked Generate, but in this particular case Splunk failed to generate a regex. So, I was forced to come up with one on my own.  

After messing around with a free tool called RegExr, and after much wailing and gnashing of teeth, I was able to come up with a regular expression to extract the proper field:


     (?:Group User <)(?P<AnyConnectUser>[^>]*)

In Splunk, I clicked the gray Edit button and entered my own regex, then saved the new field extraction.  Now we’re able to see “AnyConnectUser” as an interesting field on the left side of the search screen. (You may have noticed it in earlier screenshots, since I had already created the field extraction before writing this blog post.)

Clicking on the “AnyConnectUser” field shows a list of the top 10 hits, including the number of occurrences for each.  (Note that I’ve obfuscated many of the usernames for security). But at this point we still don’t know how many users had problems connecting (we just know it’s more than 100).  So we use some more Splunk magic–generate a report based on the search.

Clicking on “top values overall” brings up the report generation wizard.

After creating and saving the report, we can now get to it anytime from the main Search screen under the “Searches & Reports” drop-down menu:

Here’s the finished product:

After scrolling down we can see a table of the raw data:

We can then go to the last page of the table, scroll to the bottom, and see the total number of users that had at least one failure connecting to the VPN:

We had 194 users experience VPN connection problems due to our existing license limit.


Hit me up on Twitter (@swackhap) if you have questions or ideas on how to do this better.  Or leave a comment below.  

Snowmageddon vs. The Corporate Network

A major winter storm can make for some very interesting statistics. Let’s look at the primary firewall for Company XYZ, also used for remote access VPN.  We’ve got a failover pair of Cisco ASA5510s licensed for 100 simultaneous AnyConnect WebVPN connections as well as 750 IPSEC VPN connections. Our “road warriors” are set up with the IPSEC VPN on their laptops, but folks who work from home using their own personal computers usually come in using the AnyConnect WebVPN (SSL-based).

You can see from the IPSEC VPN Connections chart below that we apparently have about 80-100 “road warriors” that just keep their home computers connected all the time (based on the lowest number of connections each day).  Over the last week we’ve peaked around 160-180 except for today, which has taken us up close to 200. One of the reasons for this is because of the next chart.

The WebVPN Connections chart below shows on most days we have up to 30 connections at our peak times. Since the sky opened up and dumped snow on us overnight, you can see that we’ve more than maxed out our connection limit for WebVPN.  For days like this, our WebVPN page has a message that says something like “If there is inclement weather today and you are having problems connecting, there may be too many other people trying to connect at the same time.  You may connect using a different method, by downloading an alternate VPN client using the appropriate link below.” Then there are links for 3 .zip files: Windows XP/2000, Windows Vista/Win7, and Macintosh.  Each zip file contains the Cisco IPSEC VPN client EXE as well as two PCF files that provide limited-access profiles for the IPSEC VPN.  
Unfortunately, there doesn’t seem to be any nice error message that says “no more connections available” to indicate a user is running into a connection limit. Is there some way to do that I don’t know about?

The chart that got all this analysis started this morning also generated an e-mail telling my team the ASA VPN appliance was running high on CPU.  (Well, the chart didn’t generate the e-mail–the network monitoring system did.)  Take a look at the following Average CPU Load and you’ll see we’re running about 80% today vs. a typical day at or below 60%.

The next chart shows the bandwidth impact all this VPN traffic has on our DS3 circuit. The green line shows uplink to the Internet and is peaking close to the 45Mbps mark today. I wonder how many of those users are RDP’d to their desktops and the screensaver has kicked in, causing high bandwidth utilization. *sigh*
In case you’re wondering, all these graphs were pulled from Solarwinds Orion Network Performance Monitor (NPM). In particular, the first two charts showing connection numbers utilize Orion’s Universal Device Poller (UnDP) funtionality. There wasn’t any built-in way I could find to measure what I wanted, so I found ideas on Thwack.com (Solarwinds’ user community site) to use SNMP polling via UnDP to get those numbers. 
So who’s winning the battle…Snowmaggedon or The Corporate Network?  You decide!  Let me know on Twitter (@swackhap) or in the comments below.

RSA SecurID Soft Token for iPhone – A Better Deployment Method

Working in a retail environment makes you think really hard about security, especially in light of what happened with TJ Maxx a few years ago.  Using credit cards in retail is a privilege that we only get to keep if we follow the Payment Card Industry Data Security Standard (PCI DSS). One of the requirements of PCI is related to two-factor authentication for remote-access to your corporate network, and one solution for this is RSA’s SecurID authentication product.

RSA SecurID supports many form factors, both hardware fobs/cards and software-based on PCs and mobile devices. This post focuses on mobile device soft tokens, particularly iPhones.

For quite some time, the process to get a soft token on an iPhone looked something like this:

  1. User downloads RSA app from App Store
  2. Administrator log in to RSA SecurID appliance and assign soft token to user
  3. Generate CT-KIP credentials for web download, e-mail special link to user
  4. Connect user’s iPhone to internal corporate network
  5. Have user open e-mail on the native iPhone app and tap the link
  6. iPhone communicates directly with RSA appliance
  7. Token is now present on iPhone
Step 4 is required because of the way RSA has locked down its current appliance. The only way for an iPhone to connect to the RSA appliance from outside the corporate firewall would be to somehow expose the appliance itself to the Internet, either directly or through a Microsoft ISA proxy server.  This is one of my big gripes about the appliance, but it’s a great solution for the most part.

The most recent update to RSA’s iPhone app has greatly improved the token deployment process. Now the process looks like this:

  1. User downloads RSA app from App Store (no change)
  2. Administrator log in to RSA SecurID appliance and assign soft token to user (no change)
  3. Issue token file (.sdtid) and save to desktop
  4. Use RSA-provided TokenConverter.exe on command line to convert .sdtid file to a long string of characters, then e-mail that to user
  5. Have user open e-mail on the native iPhone app and tap the link (no change)
  6. Token is now present on iPhone
The new method precludes the requirement for the iPhone to communicate directly with the appliance, which is a huge improvement. The TokenConverter.exe is available for download from RSA’s website for both Windows and Linux, and also works with Android and Windows Mobile, though I’m not sure if it works yet for Windows Phone 7. Of course, the same token deployment process I’ve described above works for any iOS device (iPod Touch, iPad).
Kudos to RSA for improving the token deployment process! Comment below or look for me on Twitter (@swackhap).

Switch Flooding 101 – Troubleshooting Case Study

Remember the first time you learned the basics of bridging? Dig deep in your memory and think back to the basics. With helpful verification from my co-workers and Aaron Conaway (on Twitter as @aconaway), I verified that some “crazy” behavior I saw today on our network was, in fact, “normal,” albeit undesired.

I’ve been troubleshooting some very strange behaviors on our network lately. I suspect some (all?) of them have to do with our fairly old Cisco Catalyst 6500s with Sup2’s and Sup1a’s in our data center, as well as the dinosaur Catalyst 2948 access switches in our closets. There are times when our monitoring system throws alerts saying it can’t ping certain devices. But minutes later, things return to normal. (Don’t you just love intermittent problems?)  One tool that any good network engineer will consider when dealing with such a problem is a packet capture product such as the ever-popular Wireshark.

When I fired up Wireshark on my desktop computer, I had to filter through the muck to see what was going on. By “muck” I’m referring to the traffic I don’t care about, such as the traffic my box is generating, as well as broadcast and multicast. I slowly added more and more exceptions to my capture filter (see below) to narrow the scope of my capture.


My Wireshark Capture Filter: not host [my IP address] and not host [directed broadcast for my subnet] and not broadcast and not host 239.255.255.250 and not host 224.0.0.2 and not host 224.0.0.251 and not host 230.0.0.4 and not host 224.1.0.38 and not ether proto 0x0806 [for CDP] and not ether host 01:00:0c:cc:cc:cc [for HSRP] and not host 224.0.0.252 and not host 228.7.6.9 and not host 224.0.1.60 and not host 224.0.0.1 and not host 224.0.0.252 and not stp and not host 224.0.0.13 and not host 224.0.0.22

Once I filtered out enough to see more clearly, I noticed a TON of syslog (UDP 514) traffic destined for another host on my subnet. After scratching my head and consulting with co-workers, I started looking at the mac-address tables (or CAM tables). My upstream switch didn’t have a CAM table entry for the mac address of the syslog server. Neither did it’s upstream switch. In fact, the Cat 6500 directly connect to the syslog server didn’t have a CAM table entry for it.

Checking the timeouts for the CAM table on one of the CatOS switches gave us this:
CatOS-Switch> (enable) sh cam agingtime

VLAN    1 aging time = 300 sec
VLAN    2 aging time = 300 sec
VLAN    9 aging time = 300 sec
VLAN   17 aging time = 300 sec
VLAN   18 aging time = 300 sec
VLAN   20 aging time = 300 sec
VLAN   21 aging time = 300 sec
VLAN   25 aging time = 300 sec

[snip]
Similarly, the Cat6500 running Native IOS showed this:
NativeIOS-Switch#sh mac-address-table aging-time 
Vlan    Aging Time
—-    ———-
Global  300
no vlan age other than global age configured
Apparently, this syslog server is so quiet, so stealthy, that it doesn’t transmit ANY traffic for more than 5 minutes (300 sec) at a time. After 5 minutes, the CAM table entries timeout, and all traffic destined for that server gets flooded to every port in the VLAN throughout our trunked network.
One way to prevent the flooding would be to put static CAM table entries in all the affected switches. Perhaps an easier solution is to configure the syslog server to generate some traffic at least every 5 minutes or less.
I’m not sure if the flooding is causing the other strange behaviors we’re seeing on our network, but this has been a good learning experience and reminder for me about the basics of Layer-2 networking.
Any other troubleshooting ideas you would use for a situation like this? Comment here and/or hit me up on Twitter (@swackhap).

Splunk "host" Field Enhancement For Syslog-ng

We are very fortunate where I work to have Splunk. It’s an incredibly powerful indexing tool that can “eat all your IT data” and report on it in many different ways. We mostly use it to do simple searches for troubleshooting, but we’re always building more expertise as time permits.
Splunk is set up to index syslog messages very nicely by default. It takes each syslog message and intelligently recognizes the date/time stamp, then “extracts” all the fields and names them things like “host”, “eventtype”, “event_desc”, “error_code”, “log_level”, and so on.  This post focuses on the “host” field, which is the IP address of the end device (router, switch, firewall, etc).
In our environment, we send all our syslogs to a Linux server running a free open-source tool called syslog-ng. With it, we do two things: (1) save a copy of each syslog message on the local server in a flat text file named for the source IP address where it came from, and (2) forward a copy to our Splunk indexing server using TCP port 9998.
For a while I’ve noticed that our Splunk lists all syslog messages with a “host” field that is the IP of the syslog-ng server. I was able to do some research this morning and “fixed” this so now all the syslog-ng forwarded messages have their host field set to the source IP address of their original sending device (router/switch/firewall).
Here’s how I did it:
1. Created props.conf file in /san/splunk/etc/system/local with the following contents
[source::tcp:9998]
TRANSFORMS = syslog-header-stripper-ts-host syslog-host
2. Then restarted splunk with this command:
service splunk restart
Information sources I used:
Happy Splunking!

Solarwinds Orion Network Performance Monitor Bug

I am *scary* good at finding bugs in software. Just ask the Cisco TAC. Or in today’s case, ask Solarwinds support. This is a duplicate posting that I’ve also added to Solarwinds’ Thwack.com user community site. If you use Orion NPM and send SNMP traps to another network management tool, READ AND HEED.

Thwack Post Title: NPM 10.0.0 SP1 Bug: Alert Action To Send SNMP Traps Actually BROADCASTS On Local Network

Many thanks to Mariusz from the Support team for helping me pin this down. I wanted to share with all since this might be happening under your nose!

We have Orion NPM 10.0.0 SP1 and have the “Alert me when a node goes down” alert configured with two trigger actions:

  1. Log Alert to NetPerfMon Event Log
  2. Send SNMP Trap to two hosts (Microsoft Operations Manager and Orion NCM).

A DBA told me earlier today that he noticed a server was receiving traps from our Orion poller. He noticed this in that server’s Event Viewer Application Log.

With help from Mariusz and Wireshark, we found that the Orion NPM poller was actually broadcasting SNMP traps to 255.255.255.255! It seems that the workaround is to create a different trigger action for each SNMP Trap destination.  In other words, we changed our trigger actions to this:

  1. Log Alert to NetPerfMon Event Log
  2. Send SNMP Trap to Microsoft Operations Manage
  3. Send SNMP Trap to Orion NCM

As a matter of fact, for each additional valid IP destination we added to the trigger action, it appears that the Orion poller actually generated duplicate broadcasts for each SNMP trap.

If you use this feature of Orion, I recommend you check your settings and maybe run Wireshark on your poller to be sure you’re not spewing broadcasts out to your entire server subnet.

Mariusz is filing this as a bug, and I’m not sure what all versions of Orion are impacted. Feel free to add your comments to this thread.

http://thwack.com/forums/48/orion-family/9/network-performance-monitor/28193/npm-1000-sp1-bug-alert-acti/#118327

Contacts Consolidation

I don’t know about you, but I have contacts everywhere. I’ve got Exchange with Outlook at work, Google Contacts (to go along with Gmail and Google Voice), Facebook, Twitter, and Linked In.  There may be others but I spent about 30 minutes and pulled together all my current contacts from all these sources last night. Here’s how I did it:

  1. Outlook: Exported all contacts as a CSV file. Cleaned it up and imported into Google Contacts.
  2. Facebook: I found a post that explained how to use a Yahoo account to import Facebook contacts. I then exported as a CSV and, again, imported into Google Contacts.
  3. Linked In: Under the Contacts listing, there’s an easy-to-use “Export Connections” link. Exported to CSV and, you guessed it, imported into Google Contacts.
  4. Twitter: Found a nice service called MyTweeple.com that has a handy tool to export all contacts to a CSV file. Imported into Google Contacts.
By now you see a pattern developing.  Since I use Gmail and Google Voice so heavily, Google Contacts is a natural repository for all my contacts.  It also allowed me to import custom column fields, like “TwitterName”, so I have all my tweeps listed in my Google Contacts with their “twittername” as a Note attached to their details. 
Another great thing about Google Contacts is that it is great at finding and merging duplicate contacts. As you might guess, there are many people that I follow on multiple social networks, so merging duplicates is a must for me.
How do you keep your contacts organized?
Find me on Twitter at @swackhap.

Who Said Catholics Don’t Have A Sense Of Humor?

CATHOLIC GOLF

Catholic or not you have to laugh at this one.


A Catholic priest and a nun were taking a

rare afternoon off and enjoying a round

of golf.


The priest stepped up to the first tee and

took a mighty swing. He missed the ball

entirely and said “Shit, I missed.”


The good Sister told him to watch his

language.

On his next swing, he missed again.

“Shit, I missed.”

“Father, I’m not going to play with you

if you keep swearing,” the nun said tartly..

The priest promised to do better and

the round continued.

On the 4th tee, he misses again. The

usual comment followed.

Sister is really mad now and says, “Father

John, God is going to strike you dead if you

keep swearing like that.”

On the next tee, Father John swings and

misses again. “Shit, I missed.”

A terrible rumble is heard and a gigantic

bolt of lightning comes out of the sky and

strikes Sister Marie dead in her tracks..


read on

And from the sky comes a booming voice ……

“Shit, I missed.”

Google Is Great For More Than Just Searching

I’ve recently been discovering (or in some cases re-discovering) some of the awesome free stuff that Google has to offer. My Google Dashboard lights up like a Christmas tree now that I’m using so many of their tools. Here are a few that I’ve started (re)using lately.

Gmail – After looking at the web-based interface on and off for a while, I decided to take the leap. My primary e-mail address, which uses my own domain (swackhammer.net) automatically forwards all e-mail to my Gmail account. Advantages I love include speed, ability to quickly search all e-mails for what I need, and integration with all my contacts.
Google Voice – I give out one number to everyone, then can customize what phone will ring and when based on who is calling me. Annoying call from recruiter or telemarketer? Just tell Google Voice to send them to voicemail. Or better yet, play a message that indicates your number is no longer in service. 🙂 And when you do get a voicemail, you can read a transcript of it via SMS or in your e-mail so you don’t even have to listen to it. (Although some people’s accents make for some very interesting transcripts.)
Google Contacts – Integration with Gmail and Google Voice–all your important contacts in one place, all easily reachable from any web browser.
Google Reader – RSS (Really Simple Syndication) feed-reader allows me to sign up for all the news and blogs I care about and read them at my leisure. I also use the NewsRack app on my iPhone which syncs with Google Reader. Any article I read on my iPhone gets marked as “read” so I won’t waste time reading it a second time if I’m using Google Reader in a web browser.
Blogger – I’ve heard many people say they like WordPress better, but until I need features that WordPress offers, this works great for me.
Best of all, these services are FREE. I know, I know–you may be one of those people that hate Google and don’t want them tracking your every move. I’m aware of my online footprint, and as a techie I fully understand that if someone really wants to find out more about me, they will anyway.
How do you use Google? What non-Google services do you love in place of these and why?