Friday, 19 February 2016

continuous Intergration and Amazon Web Services monitoring

Day-to-day development


During the usual development of our clients' websites, we have several standard stock tools that we use to aid us with pretty much every job.

Firstly, we use SubVersion [link] as our source control. This enables us to accurately keep track of all the changes and modifications we make throughout the development build of a website or web application.

CruiseControl, or CCNet [link], is used to create a deployment bundle for the Build and QA environments. Each time we commit a change to our source control, CCNet kicks in, builds that solution, and deploys it to the relevant environment. If we commit a change that causes the build to fail, then CCNet will notify the Freestyle development team that something has gone wrong.

This is known in the industry as Continuous Integration (or CI).

The 'Build' environment is updated with each change that is made by any developer working on a given project. We use this to test new changes within the rest of the existing code, to make sure that nothing has broken in terms of functionality.


The 'Release' environment is a replica of a live site that gets in accordance with fixes, amends and updates that have been tested on the 'Build' environment. Once testing on build has been completed, and any identified errors have been fixed, we merge that code into the 'release branch'.

Monitoring these changes and builds involves loading up the CruiseControl page to check for any errors and such.

Ideally, we want an indicator of all project statuses available ‘at a glance’. This is where our wallboard, or Information Radiator, comes in. For those of you that read my previous piece on how we set up our Raspberry Pi-powered TV on the studio wall [link to previous article], this very same TV displays a web page that details the status of all our projects in one easy-to-see place.

Amazon Web Services

When we moved our live sites to the cloud, we picked Amazon Web Services, or AWS, as our preferred platform. Not only does AWS allow us to spin up a new instance with ease, and code quicker, but it also means we don't have to have actual, physical hardware in a co-located (?) hosting company. Previously, if we needed to add another machine, we'd have to buy the hardware, drive to the hosting company, then install it, then configure it, and all the rest. 

This can take some time.

With cloud-based hosting, we can simply click on the 'Create new computer instance' button, and in a few minutes, it’s ready to go. Well, it's a little more involved than that, but you get the idea – it’s much much quicker.


Monitoring Live websites

Amazon EC2 instances have some built-in monitoring that can be used through their CloudWatch service. On each virtual machine, we configure what we want to monitor, ie CPU usage, memory usage, disk space etc. All these metrics get stored in CloudWatch for consumption later.

Behind the scenes, there is some code that runs on a virtual machine in our in-house cloud. This is configured to poll Amazon every few minutes and give us up-to-date information on all of our live environments (currently, there's 200+ virtual machines that we manage). This data is then passed back down to a client that renders it into an easy-to-consume, visual indication of how the sites are performing.


If we notice that a server has unusually high processor usage, we can hop onto the virtual machine, and take a look under the hood.


We had an issue not so long ago where one of our developers noticed one of the graphs starting to show signs of high cpu usage, unusually so for that website. They quickly logged into the vitual machine to take a look.
Sure enough, a task that we had scheduled to run at a certain time everyday was misbehaving. This was caught in time before it impacted the stability of the site, before both end users or the client were aware of anything untoward. Proactive monitoring for the win!

This same code also polls our continuous integration server, giving us stats on how the build and release code are doing. So, from just one screen, we can see both internal code deployments and health-checks on live servers.

If a problem occurs, we get notified, and can quickly identify and fix the issues.


Our Wallboard


This is our current continuous integration screen:

Green means 'all good', build successful.

Red means something has been committed that has broken the build, and therefore needs looking at.

Yellow means the site is currently in the process of being built - a developer has submitted something to source control, so CCNet has taken over, and is building the site.

(these images have been edited to obscure any client-sensitive information)


The AWS screens.

This is where the statistics/metrics we get from Amazon's CloudWatch service get displayed.

We can see here that out of 232 instances we have in AWS, 178 of them are active. The rest have been turned off, because we don't need them at present. We can simply turn those back on when we do

 




We get the top 5 instances where disk space might be a problem, and needs looking into, and also the top 3 instances where CPU usage might be a problem.

The next screen further expands the CPU graphs, to the top 9 instances. This gives us an at-a-glance indication where sites might be under-performing, or have other issues that need looking at.


 

The cruise control slide updates every 30 seconds, and the two AWS screen update every 5 minutes, to give us near real-time updates

No comments:

Post a Comment