Saturday, September 19, 2009

Twitter

I should've mentioned this a while ago - Xenion status updates are posted on the @xenion_pty_ltd twitter feed. Please feel free to subscribe and post messages/queries!

Monday, August 3, 2009

Squid docs

I've started writing up some of my notes from Squid consulting into something (mostly) fit for public consumption.

This is partly to aid myself and partly to try and stop others from finding and fixing the same mistakes.

The fledgling documentation dump is here . I'll be adding more to it as I type up more notes and complete more work!

Wednesday, July 15, 2009

Installing Proxy Cache Servers for Fun and Profit..

One of my current contracts involves setting up a web cache farm for an ISP on the end of a whole lot of full duplex satellite IP. They initially specced out 5 rather large servers (at least $10,000 each); I think they had a minor heart attack when I reduced that to one server. But then, the cost of bandwidth savings versus hardware (and Xenion's contracting/support rates!) is very minor in the long run.

In any case, it has been a resounding success. I'll summarise how things look at the moment; I'll do up a proper press release sometime later next month.

There's about 15,000 users sitting behind the single proxy cache server, with around 100mbit or so aggregate satellite IP bandwidth. The service uses a slightly modified FreeBSD-7 setup to support fully transparent HTTP interception (both client and server-side IP address spoofing) with a Cisco 3750 providing the WCCPv2 interception.

Tuning the FreeBSD stack (and Linux too, for those Linux people out there!) to effectively scale for satellite IP is no easy feat. It took a bit of time but I have quite a bit of experience in this area so the tuning was quite successful. The issue here is finding the right balance between throughput, scaling and link efficiency. A little bit of first year college mathematics helped me predict some decent settings and they work as expected.

The software is Lusca-HEAD (the very recent version as of this post) - this gives me all the useful Squid-2.7 features, stability and performance with my extras (twiddles for satellite IP stuff, TPROXY support, etc.)

The box itself is a dual dual-core AMD Opteron 270 at 2GHZ; 16gig RAM; Intel 1000/pro NIC, 3ware 9000 series SATA controller with 12 x 500gig 7200rpm disks of some sort. The disks are all mounted individually - no RAID at all. 10 disks are for storage; 1 for OS and 1 for logging.

The box pushes around 80 to 120mbit at peak with a byte hit rate between 20 and 40%. The request rate sits between 300 and 600 requests a second, sometimes peaking to 800 or more. This translates to traffic savings (saving a whole lot of money - satellite transponder space is expensive!) and much improved performance for clients.

It also handles between 10,000 and 20,000 concurrent connections with peaks over 40,000. Yes. 40,000 concurrent connections. I'm not making this up.

The cache size at the moment is around 2TB and 20,000,000 objects. I'm absolutely, positively not filling the disks to capacity for a whole lot of very good reasons. (Hint - don't do it.) I'll be happier to increase the storage to 4TB and beyond once I've deployed COSS for the small objects and tidied up some of the memory usage. The Lusca process is around 4 gigabytes at the present time and 75% that is the storage index and related bits.

Just for interests sake - out of the 20,000,000 objects, around 300,000 of them are larger than 256 kilobytes. The rest are small objects. It is quite scary actually how much of the cache directory is small objects.

I've included some preliminary windows update caching which is providing a 100% hit rate for the update files themselves. It's actually quite scary how simple it was to implement. Shame on you Microsoft for -almost- but not quite getting HTTP caching "right" in the windows updates.

All in all, the client in question is extremely happy about the support, installation and performance of the cache. There's a shortlist of items to do including Lusca improvements and reporting tools so the client can provide further information to his boss about how effective this all is.

Monday, June 15, 2009

why this blog is suddenly a spam blog?!

So apparently updating all of your labels to be consistent is enough to trigger the spambot logic. I apologize to anyone reading this blog and thinking its spam - honest, it's not. Really! Honest!

Grrr!

Saturday, June 13, 2009

New replacement hosting service - hosting-5

G'day,

I've just deployed a new hosting server (hosting-5.) It's running on the new network setup, running CentOS 5.3 32-bit, and generally seems quite well-behaved.

I'm going to migrate everyone on the old Fedora Core 6 hosting server over to this over the next week and then finally retire it.

EDIT: I've migrated a couple of customer VMs onto it (with their permission, of course!) and it has fixed their stability. Even Ubuntu VMs, traditionally having been very unstable, are now stable once again. Success!

Thursday, June 4, 2009

Hosting-4 downtime

One of the VM servers, hosting-4, needed a spontaneous reboot this afternoon. It seems the Xen management software got very upset after only 360 days of uptime.



On the downside, it means those who are hosted on hosting-4 will suffer a 10-20 minute downtime. It should have been 5 minutes except that the box and VMs have been up so long that fsck is enforcing file system checks.

For example:

/dev/sda1 has gone 361 days without being checked, check forced.

So to make things run smoother, I'm manually starting each VM once the previous one has finished fscking.

On the upside, the server is now running the latest CentOS 5.3 Xen packages which have fixed a fair few bugs.

I apologise for the downtime. I have to say though, 360 day uptime is pretty good. I'll just have to make sure that further downtime is scheduled in advance.

Monday, June 1, 2009

Outage - interstate and international traffic

There's some issue with my upstream's upstream's interstate link provider. Its affecting interstate and international traffic for my upstream, my upstream's upstream and potentially other providers.

I'll post an update when the problem is resolved.

EDIT: I really should point out that I'll feed twitter updates: http://twitter.com/#search?q=%23xenion

EDIT: The service has been restored at 1:10am. I'll keep an eye on things for a bit longer.

Friday, May 22, 2009

FreeBSD 6.3 and FreeBSD-7 Xen hosting

I've been playing around with the FreeBSD-7.x and FreeBSD-6.x Xen DomU support (thanks to Kip Macy) and documenting all of the strange bits needed to make a fully working environment.

I've managed to figure out all the right incantations to build the DomU, run the DomU and for the most part, keep the DomU up and running.

I may offer FreeBSD DomU support to Xenion customers with part proceeds being donated back to the FreeBSD project. 

Let me know if you're at all interested in this!

Solaris 10, Active Directory, Squid-2.7, NTLM. Eww.

I've been working on another Solaris 10 and Active Directory + Squid NTLM integration project. I think that I've finally coaxed out the niggling bits from all of this.

In summary (thus far);

The latest Solaris 10 ships with a "sun free software" Samba package with Kerberos and Active Directory already working. Good.

It -may- still have the 8 character password limit in the "net ads join" command (for "logging in" the server into the Active Directory.) Eww.

The Kerberos setup is a bit crack smoking but reasonably trivial. The trick is making sure the realm is setup right (capitalise the realm in the kerberos configs) and that the server queries the Active Directory DNS or things just don't work. (Active Directory DNS is used to discover services - eg ldap, kerberos, wins, etc.)

The default LDAP query results in Active Directory is limited to 1000 entries. So "wbinfo -u" doesn't return all the users from a large Active Directory.

Figuring out why/when to restart winbind; when to purge the winbind idmap/usermap tdb files is very Eww. I need to properly understand what is going on there.

Make sure the damned server is NTP synched to the AD servers.

I need to make certain that the Active Directory Kerberos is returning renewable tickets.

The winbind separator works best when its "+" apparently. Again, not sure why. I need to document all of this.

Having tightly controlled firewalls makes a 1 day job take a week; but it has shown me all the random communication which happens. For example, Samba uses LDAP-over-UDP on this setup to do the initial net join..

There's more to come as I finalise this installation. I'll publish the install guides on my website.

Thursday, May 21, 2009

Downtime - web hosting services

There was a brief outage today on the web hosting services cluster. I've kicked the relevant service hard and things seem to be working again.

I'm looking forward to Sunday's upgrade and shuffle - I'll be able to do a lot more with what I have after that.

Monday, May 18, 2009

Current outage

I lost connectivity to the data centre a few minutes ago. It looks like all the WAIX participants down there are offline.

.. nope, its back now. I wonder what the problem was. There wasn't a power loss in the data centre so it looks like a problem with the backhaul to WAIX.

Thursday, May 14, 2009

Hosting Referral Special!

G'day everyone,

I'm running a little referral campaign for May and June 2009. Existing clients who bring in three new clients get three months free.

Don't be shy!