Professional Documents
Culture Documents
IT Tech Agile Config
IT Tech Agile Config
Project scope
The project is reviewing the entire CERN computer-centre management toolset
What happens from the bare metal up Asset management, inventory Sysadmin tools and maintenance workflows Service management and configuration tools Dynamic configuration for virtual hosts Operations monitoring Workflow automation and continuous deployment
Why?
Current production system built around the Quattor toolset is successfully managing 10k servers
(CERN) Quattor + many CERN components
Its easier to hire people who have skills in a widelyused tool than your internal tools
Depending on where you look
Quattor
These are the sort of posts our departing staff will be applying for.
Integration is hard
IPv6, virtualisation, Windows Server all need a solution
We could leverage lots of open source tools
But piecemeal integration of these requires high investment due to our complex system Years of organic growth have made the system way too hairy Its often easier to reinvent rather than integrate
Where to look?
Large ops community out there taking the tool chain approach whose scaling needs match ours: O(100k) servers, many apps Become standard and join this community
10
Puppet and Chef are the clear leaders for the core tool
other tools in our scope try to integrate with those
11
Add another computer centre, 24/48 SMT cores per node, you get 100k 300k virtual nodes to be managed
99.6%(1) node update success-rate means 1200 manual interventions to fix it
(1)
12
Many, diverse applications (clusters) managed by different teams ..and 700+ other unmanaged Linux nodes in VMs that could benefit from a simple configuration system
IT Technical Forum 27 Jan 2012 13
14
15
Foreman dashboard
16
17
18
Puppet
Client/server architecture
puppetmaster: horizontally scalable Rails application X509 cert authenticated nodes: integrate with CERN CA
19
Puppet
Puppet runs on the client, applying the configuration changes
It detects the current state and only runs if theres something to do
Puppet language
Puppet uses its own Ruby-like language for the templates to assert the desired state of the nodes
With Ruby fall-back for hard stuff (weve only needed this once)
Externals
Puppet uses an external DB for much of the configuration that we currently store in textual CDB templates Node function + hardware
Moving a host between clusters is a DB update
The aim is to make it easy to use pre-canned recipes without even touching a Puppet template
e.g. stick a standard CERN SSO-enabled apache / mod_wsgi / Django server on my box with these parameters
23
Standard workflow
Iterate CDB on lxadm check out from CDB update templates CDB commit
n minutes
run and notify with check on check on nc-client node(s) test node
check on foreman
Iterate Puppet-apply on test node update run check out from git on templates puppet-apply the test node check on test node git commit and push notify with mcollective check on foreman
24
Add standard CI (e.g. Jenkins, Bamboo, Cruise) and automated build (Koji) as the only route to get new packages into the CC
.. then automate the testing e.g. suitably tagged RPMs are automatically deployed to /test nodes.
25
JIRA
git, SVN
Hardware database
27
Preliminary timelines
Year
2011 2012
What
Actions
Agree overall principles Prepare formal project plan Establish IaaS in CERN CC Production Agile Infrastructure Monitoring Implementation as per WG Migrate lxcloud Early adopters to Agile Infrastructure
2013
Extend IaaS to remote CC Business Continuity Support Experiment App re-work Migrate CVI General migration to Agile with SLC6 and Windows 8 Phase out Quattor/CDB/
2014
Initial steps
Decide on tools now and integrate them together to make a production setup (Q1)
We can still change.. But were starting to commit
Help with integration / coding Help with ideas Help with building the task list
29
Summary
IT has started a new project to move our infrastructure to a new toolset based around industry standard open source components
Puppet for the core configuration tool Better integration between components Use of more modern software processes to aid deployment Better monitoring Engage with the community rather than re-implement
Backup slides
31
32
It requires a messaging framework that all nodes subscribe to (to receive the notification)
Typically: AcvtiveMQ or RabbitMQ Both Openstack and our (future) monitoring system need a CC wide messaging system as well
33