There is one thing I particularly like about Puppet: its usage patterns can grow with the user's involvement. We can start using it to explore and modify our system with puppet resource
, we can use it with local manifests to configure our machine with puppet apply
, and then we can have a central server where a puppet master
service provides configurations for all our nodes, where we run the puppet agent
command.
Eventually, our nodes' number may grow, and we may find ourselves with an overwhelmed Puppet Master that needs to scale accordingly.
In this chapter, we review how to make our Master grow with our infrastructure and how to measure and optimize Puppet performances. You will learn the following:
Generally, we don't have to care about Puppet Master's performances when we have few nodes to manage.
Few is definitively a relative word; I would say any number lower than one hundred nodes, which varies according to various factors, such as the following:
puppet master
process when it compiles the catalogs for its clients and when it makes MD5 checksums of the files served via the fileserver. Memory can be a limit too while disk I/O should generally not be a bottleneck.storeconfigs
.The simplest way we can use Puppet is via the apply
command. It is simple but powerful, because with a single puppet apply
, we can do exactly what a catalog retrieved from the Puppet Master would do on the local node.
The manifest file we may apply can be similar to our site.pp
on Puppet Master; we just have to specify modulepath
and eventually, the hiera_config
parameters will be able to reproduce the same result we would have with a client-server setup:
puppet apply ––modulepath=/etc/puppetlabs/code/modules:/etc/puppetlabs/code/site --hiera_config=/etc/puppetlabs/code/hiera.yaml /etc/puppet/manifests/site.pp
We can mimic an ENC by placing, on our manifest file, all the top scope variables and classes that will be provided by it.
This usage pattern with Puppet is the most simple and direct and, curiously, is also a popular choice in some large installations; later, we will see how a Masterless approach, based on puppet apply
, can be an alternative for scaling.
A basic Puppet Master installation is rather straightforward: we just have to install the server package and we have what is needed to start working with Puppet:
puppet master
service, which can start without any further configuration/etc/puppetlabs/code/manifests/site.pp
/etc/puppet/modules
and /opt/puppetlabs/puppet/modules
Now, we just have to run Puppet on clients with puppet agent -t --server <pupptmaster fqdn>
and sign their certificates on the Master (puppet cert sign <client certname>
) to have a working client / server environment.
We can work with such a setup if we have no more than a few dozen nodes to manage.
We have already seen the elements that affect the Puppet Master's resources, but there is another key factor that should interest us: what are our acceptable catalog compilation and application times?
Compilation occurs on the Puppet Master and can last from few seconds to minutes; it is heavily affected by the number of resources and relationships to manage, but also, obviously, by the load on the Puppet Master, which is directly related to how frequently it has to compile catalogs.
If our compilation times are too long for us, we have to verify the following conditions:
Passenger, known also as mod_rails or mod_passenger, is a fast application server that can work as a module with Apache or Nginx to serve Ruby, Python, Node.js, or Meteor web applications. Before version 4 and some of the latest 3.X versions, Puppet was a pure Ruby application that used HTTPS for client / server communication, and it could gain great benefits by using Passenger, instead of the default and embedded Webrick, as a web server.
The first element to consider when using older Puppet versions and there is the need to scale the Puppet Master is definitely the introduction of Passenger. It brings a pair of major benefits that are listed as follows:
On modern systems, where multiprocessors are the rule and not the exception, this leads to huge benefits.
Let's quickly see how to install and configure Passenger, using plain Puppet resources.
For the sake of brevity, we simulate an installation on a RedHat 6 derivative here. For other breeds, there are different methods to set up the source repo for packages, and possibly different names and paths for resources.
The following Puppet code can be placed on a file such as setup.pp
and be run with puppet apply setup.pp
.
First of all, we need to setup the EPEL repo, which contains extra packages for RedHat Linux that we need:
yumrepo { 'epel': mirrorlist => 'http://mirrors.fedoraproject.org/mirrorlist?repo=epel-6&arch=$basearch', gpgcheck => 1, enabled => 1, gpgkey => 'https://fedoraproject.org/static/0608B895.txt', }
Then, we set up the Passenger's upstream yum
repo of Stealthy Monkeys:
yumrepo { 'passenger': baseurl => 'http://passenger.stealthymonkeys.com/rhel/$releasever/$basearch', mirrorlist => 'http://passenger.stealthymonkeys.com/rhel/mirrors', enabled => 1, gpgkey => 'http://passenger.stealthymonkeys.com/RPM-GPG-KEY-stealthymonkeys.asc', }
We will then install all the required packages with the following code:
package { [ 'mod_passenger' , 'httpd' , 'mod_ssl' , 'rubygems']: ensure => present, }
Since there is not a native RPM package, we install rack
, a needed dependency, as a Ruby Gem:
package { 'rack': ensure => present, provider => gem, }
We need also to configure an Apache virtual host file:
file { '/etc/httpd/conf.d/passenger.conf': ensure => present, content => template('puppet/apache/passenger.conf.erb') }
In our template ($modulepath/puppet/templates/apache/passenger.conf.erb
would be its path for the previous sample), we need different things configured. The basic Passenger settings, which can eventually be placed in a dedicated file are as follows:
PassengerHighPerformance on PassengerMaxPoolSize 12 # Lower this if you have memory issues PassengerPoolIdleTime 1500 PassengerStatThrottleRate 120 RackAutoDetect On RailsAutoDetect Off
Then, we configure Apache to listen to the Puppet Master's port 8140 and create a Virtual Host on it:
Listen 8140 <VirtualHost *:8140>
On the Virtual Host, we terminate the SSL connection. Apache must behave as a Puppet Master when clients connect to it, so we have to configure the paths of the Puppet Master's SSL certificates as follows:
SSLEngine on SSLProtocol1all -SSLv2 -SSLv3 SSLCipherSuite ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA:ECDHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-RSA-AES256-SHA256:DHE-RSA-AES256-SHA:ECDHE-ECDSA-DES-CBC3-SHA:ECDHE-RSA-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:DES-CBC3-SHA:!DSS SSLCertificateFile /var/lib/puppet/ssl/certs/<%= @fqdn %>.pem SSLCertificateKeyFile /var/lib/puppet/ssl/private_keys/<% @fqdn %>.pem SSLCertificateChainFile /var/lib/puppet/ssl/certs/ca.pem SSLCACertificateFile /var/lib/puppet/ssl/certs/ca.pem SSLCARevocationFile /var/lib/puppet/ssl/certs/ca_crl.prm SSLVerifyClient optional SSLVerifyDepth 1 SSLOptions +StdEnvVars
A good reference for recommended values for SSLCipherSuite
can be found at https://mozilla.github.io/server-side-tls/ssl-config-generator/.
We also need to add some extra HTTP headers to the connection that is made to the Puppet Master in order to let it identify the original client (details on this later):
RequestHeader set X-SSL-Subject %{SSL_CLIENT_S_DN}e RequestHeader set X-Client-DN %{SSL_CLIENT_S_DN}e RequestHeader set X-Client-Verify %{SSL_CLIENT_VERIFY}e
Then, we enable Passenger and define a document root where we will create the rack environment to run Puppet:
PassengerEnabled On DocumentRoot /etc/puppet/rack/public/ RackBaseURI / <Directory /etc/puppet/rack/public/> Options None AllowOverride None Order allow,deny allow from all </Directory>
Finally, we add normal logging directives as follows:
ErrorLog /var/log/httpd/passenger-error.log CustomLog /var/log/httpd/passenger-access.log combined </VirtualHost>
Then, we need to create the rack environment working directories and configuration as follows:
file { ['/etc/puppet/rack', '/etc/puppet/rack/public', '/etc/puppet/rack/tmp']: ensure => directory, owner => 'puppet', group => 'puppet', } file { '/etc/puppet/rack/config.ru': ensure => present, content => template('puppet/apache/config.ru.erb') owner => 'puppet', group => 'puppet', }
In our config.ru
, we need to instruct rack on how to run Puppet as follows:
# if puppet is not in your RUBYLIB: # $LOAD_PATH.unshift('/opt/puppet/lib') $0 = "master" # ARGV << "--debug" # Uncomment to debug ARGV << "--rack" ARGV << "--confdir" << "/etc/puppet" ARGV << "--vardir" << "/var/lib/puppet" require 'puppet/util/command_line' run Puppet::Util::CommandLine.new.execute
Once things are configured, we can start our Apache. However, before doing this, we need to disable the standalone Puppet Master service as it listens to the same 8140 port and would overlap with our Apache service:
service { 'puppetmaster': ensure => stopped, enable => false, }
Then, we can finally start our Apache with Passenger. Remember that whenever we make changes to Puppet's configuration, the service to restart to apply them is Apache: Puppet Master standalone process should remain stopped:
service { 'httpd': ensure => running, enable => true, require => Service['puppetmaster'], # We start apache after having managed the puppetmaster service shutdown }
All this code, with the ERB templates it uses, should be placed in a module that allows autoloading of classes and files.
One of the major changes in Puppet version 4 is that the Puppet Server is executed over a Java Virtual Machine. Ruby implementation was fine while it offered an agile development environment in the initial versions of the project, but as the software consolidates and Puppet language is more mature and stable, a better performance and improvements in scalability and speed were required.
Reimplementing a whole application to change the language doesn't seem to be a good idea, it is a huge effort that could block the evolution of the project. That's why, wisely, the Puppet Labs team decided to do it in a way that allowed them to reimplement just some of the parts at a time. JVM allows to execute the Ruby code with JRuby, so their first step was to make sure that Puppet worked with this interpreter while they implemented a services framework for JVM that could serve as a glue for the different parts implemented in different languages. This framework is known as Trapperkeeper and is implemented in Clojure.
This Trapperkeeper-based implementation offers by now two basic points of performance tuning: controlling the maximum number of JRuby instances running at a time, and controlling the memory usage of the whole application.
Puppet Server maintains a pool of JRuby instances. When it needs to execute some Ruby code, it picks up one of these instances till it is finished, and then releases it. If a request needs an instance and there are none available, then the request is blocked till one is released. So, if there are few instances, you can suffer a lot of contention in the requests to your Puppet Server, but also, with lots of them, the server can get overloaded. It's important to choose a good number of instances for your deployment.
The maximum number of instances can be controlled with the max-active-instances
variable in the puppetserver.conf
file. The default value of this setting (leaving it commented out in the configuration) makes the Puppet Server select a safe value based on the number of CPUs; but, depending on your deployment, you can see that the CPUs of your servers are underused, or that it's overloaded if it has more responsibilities. In that case, you can decide to evaluate some other values to see which one makes better use of your resources.
You also have to take into account the memory usage of the application. The more JRuby instances it has and the bigger your catalogs are, the more memory it will need. A recommendation is to assign 512 MB as a base and an additional 512 MB for each JRuby instance. If your Puppet catalogs are very big, or if your servers have spare memory to dedicate to Puppet Server, you may consider to increase the available memory. This setting has to be configured in the JVM start-up, with the parameters -Xms
and -Xmx
that respectively control the minimum and the maximum heap size. In a JVM most of the memory used is in the heap, but it will also need a little more memory, so leave some margin with the maximum available memory in the server. This value used to be configured in the defaults file (/etc/sysconfig/puppetserver
or /etc/defaults/puppetserver
depending on the distribution). For example, for a server with four instances in a server with 4 GB memory, and by applying the recommendation we could set it to 2560 MB, but it'd probably be safe to set it to 3 GB; a very adjusted value could trigger too many times the garbage collector, what penalizes CPU performance. This would be the setting for the defaults file:
JAVA_ARGS="-Xms3072m -Xmx3072m -XX:MaxPermSize=256m"
You can see that the MaxPermSize
is also set; this limits the memory size of the permanent space, that is where the JVM stores classes, methods, and so on. Of course, any other settings available for the JVM could be used here.
A Puppet Server running on a decently sized server (ideal at least 4 CPUs and 4 GB of memory) should be able to cope with hundreds of nodes.
When this number starts to enter in to the range of thousands, or the compiled catalogs start to become big and complex, a single server will begin to have problems in handling all the traffic. Then, we need to scale horizontally, adding more Puppet Masters to manage clients' requests in a balanced way.
There are some issues to manage in such a scenario, the most important ones are:
Puppet's certificates are issued by a Certificate Authority, which is automatically created on the server when we start it the first time. We usually don't care much about it, we just sign certificate requests with puppet cert
and have everything we need to work with clients.
On a multi-Master setup, an accurate management of the Puppet Certification Authority and of the Puppet Masters' certificates becomes essential.
The main element to consider is that the first time puppet master
is executed, it automatically creates two different certificates, which are as follows:
/etc/puppetlabs/puppet/ssl/ca/ca_pub.pem
/etc/puppetlabs/puppet/ssl/ca/ca_key.pem
/etc/puppetlabs/puppet/ssl/ca/ca_crt.pem
/etc/puppetlabs/puppet/ssl/public_keys/<fqdn>
/etc/puppetlabs/puppet/ssl/private_keys/<fqdn>
; the same paths are used on clients for their own certificatesOn the Puppet Master, all the clients' public keys that still need to be signed by the CA are placed in /etc/puppetlabs/puppet/ssl/ca/requests
, and the ones that have been signed are in /etc/puppetlabs/puppet/ssl/ca/signed
.
The CA, which is managed via the puppet ca
command, performs the following functions:
x509v3
certificates (when we issue puppet cert sign <certname>
)puppet cert revoke <certname>
There are a pair of important parameters that are related to certificates that should be considered in puppet.conf
before launching the Puppet Master for the first time:
dns_alt_names
: This allows us to define a comma-separated list of names with which a node can be referred when using its certificate. By default, Puppet creates a certificate that automatically adds the names puppet
and puppet.$domain
to host's fqdn
. We should be sure to have in this list of names both the local server hostname and the name the clients used to refer to the Puppet Master (probably associated with the IP of a load balancer).ca_ttl
: This sets the duration, in seconds, of the certificates signed by the CA. The default value is 157680000, which means that after 5 years of starting your Puppet Master, its certificate expires and has to be reissued. This is an experience that most of us have already faced and involves the recreation and signing of all their certificates.Note that the whole /etc/puppetlabs/puppet/ssl
directory and the certificates it contains are recreated from scratch if it doesn't exist when Puppet runs. Therefore, if we want to recreate our Puppet Master's certificates with corrected settings, we have to move the existing ssldir
to a backup place (just as a precaution in case we change the idea, we won't need it anymore otherwise), configure puppet.conf
as needed, and restart the Puppet Master service.
This is an activity that we can do light heartedly, on the Master, only when it has been just installed and there are no (or few) signed clients because when we recreate ssldir
with new certificates on the Master communication with clients won't be possible. All the previously signed certificates are no longer valid and have to be recreated.
CA management in a multi-master setup can be done in the following different ways:
On puppet.conf
, configuration is quite straightforward when the CA server is (or might be) different from the Puppet Master:
ca_server
hostname (which is, by default, the same Puppet Master):[agent] ca_server = puppetca.example42.com
[master] certname = puppetca.example42.com ca = true
ca_server
:[agent] ca_server = puppetca.example42.com [master] certname = puppet01.example42.com ca = false
When we deal with Puppet's client server traffic, we can apply all the logics that are valid for HTTPS connections. We can, therefore, have different scenarios as follows:
When configuring the involved elements, we have to take care of the following elements:
In our puppet.conf
file, there are always the following default settings, which define the name of the HTTP header (with an HTTP_
prefix and underscores instead of dashes).
They contain the clients' SSL Distinguished Name (DN) and the name of the HTTP header that contains the status message of the client verification (expected value for a trusted, not revoked, client certificate is SUCCESS
):
ssl_client_header = HTTP_X_CLIENT_DN ssl_client_verify_header = HTTP_X_CLIENT_VERIFY
On the web server(s) where SSL is terminated (it might be Passenger in a single server setup or an Apache, which balances and reverse proxies the Puppet Master backend), we need to set these HTTP headers extracting info from SSL environment variables as follows:
RequestHeader set X-SSL-Subject %{SSL_CLIENT_S_DN}e RequestHeader set X-Client-DN %{SSL_CLIENT_S_DN}e RequestHeader set X-Client-Verify %{SSL_CLIENT_VERIFY}e
These servers are the ones that communicate directly with clients and terminate the SSL connection, we can define them as frontend servers, they act as proxy and generate a new connection to backend Puppet Masters that do the real work and compile catalogs.
Since SSL has been terminated on the frontends, traffic from them to backend servers is in clear text (they are supposed to be in the same LAN), and on the backend Apache, we need to state where to get the client's certificates DN, using the previous extra headers:
SetEnvIf X-Client-Verify "(.*)" SSL_CLIENT_VERIFY=$1 SetEnvIf X-SSL-Client-DN "(.*)" SSL_CLIENT_S_DN=$1
Also, on a backend server, we do not need to configure all the other SSL settings, and just need a Virtual Host with rack configurations.
Given this information, we can compose our topology of web servers that handle Puppet traffic in a very flexible way, with one or more frontend servers that proxy requests to the backend Puppet Masters and terminate SSL, and with backend Puppet Masters that run Puppet via Passenger.
Deployment of Puppet code and data is another factor to consider. We probably want the same code deployed on all our Puppet Masters. We can do this in various ways: all of them basically require the remote and/or triggered execution of some commands (if we want to avoid the need to log into each server every time a change on Puppet is done) or a way to keep files synced across different servers.
How a deployment script or command may work is definitively tied to how we manage our code: we might execute r10k
or librarian puppet
, or make a git pull
on our local directories to fetch changes from a central repo.
Alternatively, we might decide to have our Puppet code and data on a shared file system or keep them synced with tools such as rsync.
In any case, we have to copy/sync or share all the directories where our code and data: the modules, manifest, and Hiera directories, if used, are placed.
When we have to balance a pool of Puppet Masters, we have different options, which are as follows:
puppet.example42.com
). It then redirects all the TCP connections to the Puppet Masters (in their dns_alt_names
they need to have the name of the Puppet Master host configured on clients).dns_alt_names
. This solution is quite easy to implement, as it does not require additional systems to manage load balancing, but has the (major) drawback of not being able to detect failures and remove non-responding Puppet Masters from the pool of balanced servers.puppet.conf
, we have to indicate the srv_domain
this way:[main]use_srv_records = true srv_domain = example.com
DNS SRV records are used to define the hostnames and ports of servers that provide specific services. They can also set priorities and weights for the different servers. For example, for Puppet, the following records could be used:
_x-puppet._tcp.example.com. IN SRV 0 5 8140 p1.example.com. _x-puppet._tcp.example.com. IN SRV 0 5 8140 p2.example.com.
Clients need to explicitly support these records in order to use these kind of configurations.
An alternative approach to the Puppet Master scaling methods that we have seen so far is not to use it at all. Masterless setups involve the direct execution of puppet apply
on each node, where all the needed Puppet code and data has to be stored.
In this case, we have to find a way to distribute our modules, manifests, and, eventually, Hiera data to all the clients. We still can use external components such as:
external_nodes
script can work as it works on the Puppet Master; it can interrogate any external source of knowledge on how to classify nodes. A concern here is whether it makes sense to introduce a central point of authority when we want a distributed decentralized setup.We also need a way to run Puppet on the clients in an automated or centrally managed way; it may be via a cron job or a remote command execution.
Distribution of Puppet code and data may be done in different ways, as follows:
git pull
from central repositoriesrpm
, deb
and so on) from a custom reposync
or rdiff
Whatever the layout of our Puppet infrastructure, we may consider some other options to optimize its performances.
A first quick attempt may be done by activating the compression of HTTPS traffic between clients and Master. The following option has to be set on puppet.conf
at both ends:
http_compression = true
The case where it makes sense to enable it is mostly where we have clients that reach the server via a WAN link, generally via a VPN, where throughput is definitely not the one we have on LAN communications. If we have large catalogs and reports, their compression during transfer, being mostly text files, can be quite effective.
Another area where we might operate is catalog caching. This is a delicate topic, as it is not easy to determine what has changed on the client's side (some facts like uptime
change always by definition, others are supposed to be more stable) and on the server's side (changes on the Puppet code and data may or may not affect a specific node). The challenge therefore, is to always provide the correct and updated catalog when a caching mechanism is in place.
Puppet provides some configuration options to manage caching. By default, Puppet doesn't recompile the catalog if it has a local version cached with an updated timestamp and facts which have not changed. When we want to be sure to obtain a new catalog, we have to enable the ignorecache
option:
ignorecache = true # Default: false
Note that this is automatically done when we run the puppet agent -t
command, which ensures that we have always a freshly compiled catalog.
We can also tell to the client to always use a local cached copy of the catalog, instead of asking it to the Puppet Master:
use_cached_catalog = true # Default: false
This might be useful in cases where we want to temporarily freeze the configurations applied to a client without having to disable the Puppet service and without caring about eventual changes on the Puppet Master.
If we run Puppet via cron or other time based mechanism, we need to avoid the problem of having all our clients hitting the Master and requesting their catalog at the same time. There are various options to distribute Puppet runs in order to avoid peaks of too many concurrent requests.
We can introduce a random sleep delay in the command we execute via cron, for example with cron entries based on ERB templates, such as:
0,30 * * * * root sleep <%= @sleep &> ; puppet agent --onetime
Where the $sleep
variable with the number of seconds to wait may be randomly defined in Puppet manifests with the fqdn_rand()
function, which returns a random value based on the node's full hostname (so it's random (not in a cryptographically usable way), but doesn't change at every catalog compilation):
$sleep = rqdn_rand('1800') # Returns a number from 0 to 1800
Alternatively, we can use the splay
configuration option in puppet.conf
, which introduces a random (but consistent) delay at every Puppet run, and which can be as long as defined by splaylimit
(whose sane default is Puppet's run interval):
splay = true # Default: false splaylimit = 1h # Default: $runinterval
On Puppet Master, there is an option, filetimeout
, which sets the minimum time to wait between checking for updates in configuration files (manifests, templates, and so on). This determines how quickly the Master checks whether a file is changed on disk.
The default value is 15 seconds, and can be changed in puppet.conf
.
This setting has very limited effects on the performances (unless, I suppose, we lower it too much), but it's important to know that it exists, because it's the reason why, sometimes, nothing new happens on the client when we launch a Puppet run immediately after a change on some file of Puppet Master.
This may lead to some confusion, we make a change on some manifests, we run Puppet, and nothing happens. Then, we run Puppet again and the change is finally received and we wonder what the hell is happening. Therefore, be aware that there is such an option and, more importantly, be aware of this behavior of the Master that scans the directories where our Puppet code and files are placed at regular intervals and might not immediately process the very latest changes made to these files.
We have seen that the usage of exported resources allows resources declared on a node to be applied on another node. In order to achieve this, Puppet needs the storeconfigs
option enabled and this involves the usage of an external database where all the information about the exported resources is stored.
The usage of stored configs
has been historically a big performance killer for Puppet. The amount of database transactions involved for each run makes it a quite resource intensive activity.
There are various options in puppet.conf
that permit us to tune our configurations. The default settings are as follows:
storeconfigs = false storeconfigs_backend = active_record dbadapter = sqlite3 thin_storeconfigs = false
If we enable them with storeconfigs = true
, the default configuration involves the usage of the active_record
backend and a SQLite database.
This is a solution that performs quite badly and therefore should be used only in test or small environments. It has the unique benefit that we don't need any other activity, we just have to install the SQLite Ruby bindings package on our system. With such a setup, we will quickly have access problems to the SQL backend with multiple concurrent Puppet runs.
The next step is to use a more performant backend for data persistence. Before the introduction of PuppetDB, MySql was the only alternative. In order to enable it, we have to set the following options in puppet.conf
:
dbadapter = mysql dbname = puppet # Default value dbserver = localhost # Default value dbuser = puppet # Default value dbpassword = puppet # Default value
Such a setup involves a local MySQL server where we have created a puppet
database with the relevant grants, so from our MySQL console, we should write something like the following code:
create database puppet; GRANT ALL ON puppet.* to 'puppet'@'localhost' IDENTIFIED by 'puppet'; flush privileges;
This is enough to have a Puppet Master storing its data on a local MySQL backend. If the load on our system increases, we can move the MySQL service to another dedicated server and can tune our MySQL server.
Brice Figureau, who heavily contributed to the original store configs code, made an interesting presentation at the first Puppet Camp on this topic at http://www.slideshare.net/masterzen/all-about-storeconfigs-2123814, where useful hints are provided to configure MySQL in a dedicated server to scale for the inserts:
innodb_buffer_pool_size = 70% of physical RAM innodb_log_file_size = up to 5% of physical RAM innodb_flush_method = O_DIRECT innodb_flush_log_at_trx_commit = 2
Also, to optimize the most common queries on Puppet's Wiki, it is suggested that this index is created from the MySQL console as follows:
use database puppet; create index exported_restype_title on resources (exported, restype, title(50));
We can limit the amount of information stored by setting thin_storeconfigs = true
. This makes Puppet store just facts and exported resources on the database and not the whole catalog and its related data. This option is useful with the active_record
backend (with PuppetDB it is not necessary).
What we have written so far about store configs using the active records backend made a lot of sense some years ago, and we referenced it here to have a view on how to scale with store configs. Truth is that the best and recommended way to use store configs is via the PuppetDB backend, this is done by placing these settings in puppet.conf
:
storeconfigs = true storeconfigs_backend = puppetdb
We have dedicated the whole of Chapter 3, Introducing PuppetDB to PuppetDB because it is definitively a major player in the Puppet ecosystem. The performance improvements it brings are huge so there is really no reason not to use it.
The components of PuppetDB can be distributed to scale better: