Puppet is an extensible automation framework, a tool, and a language. We can do great things with it and we can do them in many different ways. Besides the technicalities of learning the basics of its DSL, one of the biggest challenges for new and not-so-new users of Puppet is how to organize code and put things together in a manageable and appropriate way.
It's hard to find comprehensive documentation on how to use public code (modules), custom manifests and custom data, where to place our logic, how to maintain and scale it, and, generally, how to manage, safely and effectively, the resources that we want in our nodes and the data that defines them.
There's not really a single answer that fits all cases; there are best practices and recommendations, but ultimately it all depends on our own needs and infrastructure, which vary according to multiple factors. One of these principal factors is the characteristics of the infrastructure to manage itself, its size, and the number of application stacks to manage. Other factors are more related with Puppet code management, such as the skills of the people working with it, the number of teams involved, the integration with other tools, or the presence of policies for changes in production.
In this chapter, we will outline the elements needed to design a Puppet architecture, reviewing the following in particular:
With Puppet, we manage our systems via the catalog that the Puppet Master compiles for each node, which is the total of the resources we have declared in our code, based on parameters and variables whose values reflect our logic and needs.
Most of the time, we also provide configuration files either as static files or viB templates, populated according to the variables we have set.
We can identify the following major tasks when we have to manage what we want to configure on our nodes:
These tasks can be provided by different, partly interchangeable, components:
site.pp
, the first file parsed by the Puppet Master and eventually all the files imported from there (import nodes/*.pp
would import and parse all the code defined in the files with .pp
suffix in the /etc/puppet/manifests/nodes/
directory). Here we have code in Puppet language.puppet.conf
:[master] node_terminus = exec external_nodes = /etc/puppet/node.rb
What's referred by the external_nodes
parameter can be any script that uses any backend; it's invoked with the client's certname
as first argument (for example, /etc/puppet/node.rb web01.example.com
) and should return a YAML-formatted output that defines the classes to include for that node, the parameters, and the Puppet environment to use.
Besides the well-known Puppet-specific ENCs, such as the Foreman and Puppet Dashboard (a former Puppet Labs project now maintained by community members), it's not uncommon to write custom ones that leverage on existing tools and infrastructure management solutions:
puppet.conf
:[master] node_terminus = ldap ldapserver = ldap.example.com ldapbase = ou=Hosts,dc=example,dc=com
Then we have to add Puppet's schema to our LDAP server. For more information and details, check: http://docs.puppetlabs.com/guides/ldap_nodes.html
site
or have the name of our company, customer, project, or anything in general. Site modules have particular sense as companions to public modules when used without local modifications; on site modules, we can place local settings, files, custom logic, and resources.The distinction between public reusable modules and site modules is purely formal; they are both Puppet modules with standard structure. It may anyway make sense to place in separate directories (module paths) the ones you develop internally or modify from a public source from the public ones you use unaltered.
Let's see how these components may fit our Puppet tasks.
This is typically done when, in Puppet, we talk about nodes' classification: the task that the Puppet Master accomplishes when it receives a client's request and determines the classes and parameters to use when compiling the relevant catalog.
Nodes' classification can be done in different ways:
node
object on site.pp
and other manifests eventually imported from there. In this way we identify each node by certname
and declare all the resources and classes we want for it:node 'web01.example.com' { include ::general include ::apache }
--- classes: - general: - apache: parameters: dns_servers: - 8.8.8.8 - 8.8.4.4 smtp_server: smtp.example.com environment: production
puppetClass
attribute) set in a parent node (parentNode
).hiera_include
function; just add in site.pp
:hiera_include('classes')
This is another crucial part, as with parameters we can characterize our nodes and define the resources we want for them.
Generally, to identify and characterize a node in order to differentiate it from the others and provide the specific resources we want for it, we need very few key parameters, such as (names used here may be common, but are arbitrary and are not Puppet's internal ones):
webserver
, app_be
, db
or anything that identifies the function of the node. Note that web servers that serve different web applications should have different roles (that is webserver_site
, webserver_blog
). We can have one or more nodes with the same role.development
, test
, qa
, or production
server?).With parameters like these, any node can be fully identified and be served with any specific configuration. It makes sense to provide them, where possible, as facts.
The parameters and the variables we use in our manifests may have different natures, such as:
Many times, the values of these variables and parameters have to change according to the values of other variables, and it's important to have a general idea, from the beginning, of what the variations involved and the possible exceptions are, as we will probably define our logic according to them. As a general rule we will most of the time use the identifying parameters (role/env/zone...) to define most of the other parameters, so we'll probably need to use them in our Hiera hierarchy or in Puppet selectors. This also means that we will probably need to set them as top scope variables (for example via an ENC) or facts.
As with classes to include, parameters may be set by various components; some of them are actually the same, since in Puppet, node classification involves both classes to include and parameters to apply:
site.pp
, we can set variables. If they are outside nodes' definitions they are at top scope, and if they are inside they are at node scope. Top scope variables should be referenced with a ::
prefix, for example $::role
. Node scope variables are available inside the node's classes with their plain name: $role
.puppetVar
attribute. They are all set at top scope.On Hiera we set keys which we can map to Puppet variables with the functions hiera()
, hiera_array()
, and hiera_hash()
inside our Puppet code. Since version 3, Puppet's data bindings automatically look up class parameters from Hiera data, mapping parameter names to Hiera keys, so for these cases we don't have to explicitly use hiera*
functions. The defined hierarchy determines how the keys' values change according to the values of other variables. On Hiera, ideally, we should place variables related to our infrastructure and credentials, but not OS related variables (they should stay in modules if we want them to be reusable).
A lot of documentation about Hiera shows sample hierarchies with facts like osfamily
and operatingsystem
. In my very personal opinion, such variables should not stay there (weighting the hierarchy size), since OS differences should be managed in the classes and modules used and not in Hiera. Specific parameters for a deployment should be in data; common things that may vary between operating systems should be in the module implementation.
It's almost certain that we will need to manage configuration files with Puppet and that we need to store them somewhere, either as plain static files to serve via Puppet's fileserver functionality using the source
argument of the File
type, or via .erb
templates.
While it's possible to configure custom fileserver shares for static files and absolute paths for templates, it's definitely recommended to rely on the modules' auto-loading conventions and place such files inside custom or shared modules, unless we decide to use Hiera for them.
Configuration files, therefore, are typically placed in the following:
hiera-file
backend, this can be an interesting alternative place to store configuration files, both static ones and templates. We can benefit from the hierarchy logic that works for us and can manage any kind of file without touching modules./etc/puppet/fileserver.conf
is like the following:[data] path /etc/puppet/static_files allow *
/etc/puppet/static_files/generated/file.txt
with the argument:source => 'puppet:///data/generated/file.txt',
We'll probably need to provide custom resources to our nodes that are not declared in the shared modules because they are too specific, and we'll probably want to create some grouping classes, for example, to manage the common baseline of resources and classes we want applied to all our nodes.
This is typically a bunch of custom code and logic that we have to place somewhere. The usual locations are as follows:
create_resources
function fed by hashes provided in Hiera. In this case, in a site, shared module, or maybe even in site.pp
, we have to place the create_resources
statements somewhere.