I know this may sound like a tenuous link, but bear with me!

No doubt you’ll have heard people comparing cloud management with the pets vs. cattle analogy – the idea being that ‘pets’ represent indispensible servers that can never go down, while ‘cattle’ are your servers designed for failure.

It got me thinking…in the world of cloud resource management, there are a few things we can learn from our cattle-farming friends.

Think about it: all cattle on a farm are tagged. They’re assigned references to identify them and make the management of a heard easier. Sound familiar?

Let me go back up a level…

In order to manage multiple cloud resources you have know about each of them – what they do, how they behave, what relies on them and who owns them.

Once you know this, you can start to group the resources together. Once you’ve done that, you can define and assign rules to standardise your management approach and ensure you gain efficiencies.

These rules make up the fundamental policies that outline the principles by which you apply management control.

In my opinion, policy-based management is the most crucial element of cloud services. Yet I meet customers on a weekly basis who have not even attempted to implement policies to support the management of their cloud estate.

With poor management, 30% of cloud resources are unused and effectively costing you money for nothing. But do you have a policy to tell you what to turn off?

Who wouldn’t set up a basic set of policies to ensure financial rules were adhered to?

Preventative and detective

The principles of policy-based management can be traced back to frameworks like COBIT (which may not be the sexiest thing that you’ll read about on Hybrid Hive, but I won’t let that stop me!)

I’ll simplify for the purposes of making my point: if you focus on detective and preventative controls, then you can’t go too far wrong. You have a self-service-powered and often cloud-naïve set of users, who need your support in order to maintain a level of control.

Preventative controls: putting in place systems to stop your users from doing the wrong things in the first place. Mandating tagging, naming conventions, automatically adding monitoring tools or automatically encrypting data are all basics which can be integrated in a provisioning workflow to prevent new systems from breaching your policies.

Detective controls: accepting that things do go wrong and that preventative systems can be bypassed by API calls, or other routes, you need a way to “detect” when something has gone wrong. You could do this with a simple policy compliance dashboard, which shows you that one of your policies has been violated and allowing you to trigger a remedial action to resolve the situation. You could apply automation to recover some of these circumstances without human intervention, although sometimes thought will be required.

Start with the basics

In my experience you should start with the basics and document the policies that you need to implement. Some of these things are common sense and may have been in your thinking all along, but they now need to be communicated and enforced.

  • Document your policies and communicate them with your team
  • Identify the data needed to enforce the policies
  • Mandate the data so that you can collect, analyse and report on it and automate preventative controls

Data

More often than not, your mandatory data will require the use of metadata, or tags. Just like our tagged cattle at the start, you need to identify your resources. I always recommend that you start with three basic tags:

  1. Environment: is it Dev, Test or Prod?
  2. Service: what is the name of the service that this resource supports?
  3. Owner: what is the owning department of this resource? Who uses it? Who pays for it?

Policy types

When planning this for your organisation, think about the following types of policies (although if you’ve used others, please do drop me a note in the comments section below):

  • Scheduling: which types of resources should be shut down and when? For example, development systems should be shut down outside of working hours.
  • Placement: where to place different systems. For example, development occurs on a public cloud and production with live data is on a private cloud.
  • Scaling: which applications can scale resources automatically, and is there a limit at which the scaling stops providing financial benefit? For example, an e-commerce system at peak trading periods
  • Approval: which actions should require approval and which should not? If a benefit of moving to the cloud was “increased agility”, does every new resource require a full approval loop?
  • Operational management: what operational management controls should be mandatory for each type of system? For example, should all development resources be subject to the same backup and monitoring regimes that you apply in production?
  • Security: who should have access? Which data should be encrypted?
  • Platforms: which platforms should be used and which should not? Is it as simple as picking the cheapest platform, or are there networking, security and interoperability standards to consider?
  • Cost: should the cloud resources be paid for centrally by IT, or should they be automatically charged back to the owning business unit or department?
  • Utilisation: many organisations are over-paying for their cloud usage due to resources that are not used. Should under-utilised resources be terminated?
  • Naming: you’ve probably had server naming conventions for years, but what about a naming policy for your cloud systems, or even the tags that you identify them with?

Tools

In order for you to implement the preventative and detective controls, you’re going to need a technology answer.

Many of the best cloud management platforms have focused on policy over the last few years. The likes of Rightscale, Scalr and Fujitsu Cloud Services Management have great tools to simplify this area of cloud management.

My recommendation to you is to review your current technology investment, check to see what exists that you may have already paid for and then review complementary tools to fill any gaps.

###

Hopefully I’ve convinced you of the parallels between cattle farming and cloud policy management.

If not, at least I’ve given you a useful starting point for your own implementation of policy-based management for your resources.

If you have any experiences with this approach, I’d love to hear from you. Drop me a line via the comments section below.

Nick Herbert

Words by

Nick is Head of Orchestration, Hybrid IT at Fujitsu EMEIA

fujitsu-logo

Like this article?

We'd love for you to spread the word about Hybrid Hive and the work we're doing. Why not share this on Twitter or Linkedin?