Share Button

Dealing with legacy infrastructure is like dealing with the sunset scenario of an old car. If timed right, maximum ROI can be extracted out of the old car before investing in the latest and greatest technology of a new car. Migrating infrastructure up to the cloud is one transition option but there still are significant use cases for physical on-premises or self-hosted infrastructure.

Any organization that has grown “organically” and/or has been in business for more than a few years, ultimately needs to deal with legacy infrastructure. As organizations get older, this cycle repeats and hopefully gets better every few years.

Legacy infrastructure and legacy applications are related but two different problems. In this context, infrastructure is Servers, Switches, Firewalls, Routers, Storage, Racks, Cabling, Power and PDUs.

Legacy infrastructure is –

  • EOS (End Of Sale) / EOL (End of Life) / EOSup (End of Support)
    • The manufacturer will not sell, repair, upgrade or support
  • About to go EOS / EOL / EOSup
    • Manufacturer will support it but requires signing your next child’s life earnings to them
  • Was being maintained but that team member left / was let go
  • Bought this expensive unit for project X but that never took off

Why Upgrade?
Ideally, legacy infra management starts at the time of purchase. Proper architecture, procurement and implementation strategy can significantly aid with maximum ROI over the years.

Mature organizations (large and small) tend to have a policy driven cycle of regular upgrades fueled by engineering decisions, asset depreciation or even compliance certification. For SLA driven hosted service providers, planning for upgrades is (or should be) a critical part of infrastructure strategy.

Full Stacks Migrations
Top to bottom, a typical infrastructure stack looks like this –

– (Out of Scope) OS / Hypervisor / Application
– Servers
– Storage
– Layer 3+ Devices – Firewalls / Routers
– Layer 2 – Switches
– Cabling
– Racks
– PDUs
– Power Circuits

It is a lot of work but a lot cleaner to do full stack migrations. Full stack migrations keep project managers employed. Design and build a new stack (Power Circuits to OS) and do a massive coordinated application migration to the new stack. In most situations, this is not an option.

Partial upgrades at individual layers are messy but more common. This keeps systems admins busy (and employed).

Power Circuits and PDUs
As applications scale, more resource demands drive the need for higher capacity or higher density equipment. If the building has more power available, run new circuits with different / higher power capacity (220V instead of 120V. Or 3 phase) and then swap out old PDUs. One typical roadblock with power is cooling capacity. Some cooling efficiency can be gained by adding cold isle containment.

Power circuits + PDUs can be swapped out without downtime if old PDUs and power cabling have been installed properly. If old cabling is a mess, scheduled downtime might be a better option than blowing fuses and power supplies when cables come loose.

Most PDUs have a long life and can be reused. Except some of the management features, flexible designs and increased densities, PDUs have not seen much innovation over the years. They don’t fetch much in the secondhand market either. So might as well keep them and reuse.

Racks
Most older racks can be reused as-is unless new designs demand for better cabling, physical security and cooling. Space permitting, rear extensions can be added to some racks to accommodate additional accessories, better cable management or bigger PDUs. When picking racks, deeper, wider, taller, the better.

Cabling
Old network and power cables can be reused unless if connectivity designs have changed. Different cable type (Cat7A vs. Cat6. PDU style vs. 3-prong power) typically does not make much difference for average applications unless the equipment is really pushing close to data and power limits. Cabling upgrades, if done right, are typically one of the most expensive processes in terms of labor cost and time. But cabling done right can often survive next several iterations of legacy infrastructure refresh. Lab environments are different where rip and replace (or re-run) old cables is the best policy.

Cabling designs deserve their own post. Something for the future.

Network Switches
Upgrading a 10 year old 1Gb copper network switch with a latest generation 1Gb copper network switch might not be very beneficial except if the backbone network has moved to different, higher speed interfaces. Network switch upgrades can be very disruptive if critical servers do not have multiple links into the access / storage layers. Core switch upgrades are even worse if cabling and other issues are not considered when provisioning them.

Network switches do have some resale value in the secondhand market. Older switches are also a very good option for lab environments, test stacks or even training / R&D. Swapping network vendors between upgrades is sometimes a religious and team morale issue but is possible. I know because I have been there, done that, survived and apparently, people still like me.

Layer 3+ Devices – Firewalls / Routers
In a stable environment with most of the flux at the application layer, edge routers often do not see much change over the years. Except for occasional software upgrades and configuration updates, edge routers sit rock solid. Edge / core router upgrades are easier if appropriate routing and redundancy (HSRP?) protocols are implemented. Typical upgrade process involves replacing the standby unit, switching traffic to the new unit and then doing the same with the active unit.

Firewalls are a bit more involved in application stacks and require constant security upgrades, configuration changes, etc. As traffic grows, firewalls run out of capacity and need to be replaced. Most decent firewalls do have redundancy options. If connection state replication is
an option, pretty much no disruption is expected but most firewall changes are very disruptive.

Routers and firewalls, with advance planning and proper implementation, can prevent or at least reduce downtime significantly.

Storage
DAS, NAS, SAN – out of the 3 flavors, DAS is obviously the most disruptive as physical changes are needed.

NAS and SAN upgrades are easier if the underlying protocol remains the same. Without depending on the storage vendor, NAS and SAN upgrades can be painless (not painfree) if the application supports online data migration.

For e.g. VMWare can do “Storage Motion” across two storage targets. Entire virtual machine can be moved to newer storage without any downtime. Oracle RAC on the other hand ties in tightly with the storage backend with shared volumes for data, voting disks and other cluster uses. Downtime is almost guaranteed in most cases unless if storage layer clustering magic saves the day.

Most of the older storage units do not have clustering or non-disruptive data replication and transition options. Newer units have the option to add new cluster partners and migrate services over transparently. All of that sounds very nice but this is not possible if the old unit does not have clustering configured right.

Once decommissioned, old storage can be re-provisioned as backup storage for non-critical environments or used for internal training / R&D.

Servers
If an application is running well with no server capacity issues, they are best kept as-is. If it ain’t broken, don’t fix it. If the hardware is out of warranty, keep a few spare units to cannibalize for parts. Second hand units of old models often flood the secondhand market in batches and make for very good spare part bins. If the team has some skill and time available, old servers make for excellent clusters. A nice hadoop cluster with large NFS storage pool for backups or even simpler storage cluster using Gluster (e.g.) is easily possible.

Older servers with more RAM and faster HDDs fetch more in the secondhand market.

What to do with an old pile of hardware?
Here is where company policy and culture can make a big difference. To extract maximum ROI out of old hardware, some level of inventory and resource management skills are needed. Some creative re-assignment of old hardware can provide significant boost to newer POC initiatives with temporary and ever changing resource needs.

There are several other ways to deal with old hardware –

  • Resell / liquidate
  • Donate it (Tax Benefit?)
  • Re-use in test / non-critical environments
  • Use as a break / fix / ops training tool
  • Give it away as an employee perk. Lot of people run a small datacenters in their garage at home.

Infrastructure management is a lot of hard work but provides necessary resources to make an organization successful. If not managed well, infrastructure tends to be a large sinkhole for hard cash.

Do not blindly design and run your infrastructure with the aspirations of copying Google or Yahoo!. Every organization is different.

YMMV.