When the Puppet agent is started on a node, one of the first things that the agent does is look up the value for the server
option. You can either specify this with --server
on the command line, or with server=[hostname]
in the puppet.conf
configuration file. By default, Puppet will look for a server named puppet
. If it cannot find one named puppet
, it will then try puppet.[your domain]
.
Tip
What Puppet believes to be your domain may be obtained by running facter domain
.
When you are debugging the initial communication problems, you need to first verify that your nodes can find the Puppet master. For Unix systems, the way in which the system searches for a machine by name is called the gethostbyname
system call. This system call uses the Name Service Switch (NSS) library to find a host in a number of databases. NSS is configured by the /etc/nsswitch.conf
file. The line in this file that is used to find hosts by their respective names is the hosts line. The default configuration on most of the systems is the following:
This line means that the system will search for hosts by name in the local files first. Then, if the host is not found, it will search in the Internet
Domain Name System (DNS). The local file that is first consulted is /etc/hosts
. This file contains static host entries. If you inherited your Puppet environment, you should look in this file for statically defined Puppet entries. If the machine puppet
or puppet.[domain]
is not found in /etc/hosts
, the system then queries the DNS to find the host. The DNS is configured with the /etc/resolv.conf
file on the Unix systems.
Tip
When troubleshooting, be aware that the domain
fact is calculated using a combination of calls to the utility hostname and looking for a domain
line in /etc/resolv.conf
.
This file is known as the resolver configuration file. It's important to verify that you can reach the servers listed in the nameserver
lines in this file. Your file may contain a search line. This line lists the domains that will be appended to your search queries. Consider a situation where the search line is as follows:
When you search for Puppet, the system will first search for puppet
, then puppet.example.com
, then puppet.external.example.com
, and finally puppet.internal.example.com
.
Several utilities exist for the testing of DNS. Among these utilities, host
and dig
are the most common. An older utility, nslookup
, may also be used. To lookup the ipaddress
option of the default Puppet Server, use the following:
In this example, the host puppet
is not found. Yet, I know that this node works as expected. Remember that the system uses the gethostbyname
system call when looking up the Puppet Server. Another utility on the system uses this call—the ping
utility. When we try to ping the Puppet Server, this succeeds, and the output is as follows:
As you can see, the loopback address (127.0.0.1
) is being used for the Puppet Server. We can verify that this information is coming from the /etc/hosts
file using grep
:
Remembering the difference between using host
or dig
and using the gethostbyname
system call can quickly help you find problems with your configuration. Adding an entry to /etc/hosts
for your Puppet Server also bypasses any DNS problems that you may have in the initial configuration of your nodes.
The next step in diagnosing network issues is verifying that you can reach the Puppet Server on the masterport
, which is by default TCP port 8140. The masterport
number may be changed, though. So, you should first confirm the port number using puppet config print masterport
. One of the simplest tests to verify that you can reach the Puppet Server on port 8140 is to use
Netcat. Netcat is known as the Swiss Army knife of network tools. You can do many interesting things with Netcat. More information about Netcat is available at http://nmap.org/ncat/.
Tip
There are several versions of Netcat available. The version installed on the most recent distributions is Ncat. The rewrite was done by Nmap (for more information, visit https://nmap.org).
To verify that you can reach port 8140 on your Puppet Server, issue the following command:
If your Puppet Server was inaccessible, you will see an error message that looks like this:
If you see a Connection refused
error as in the preceding output, this may indicate that there is a host-based firewall on the Puppet Server that is refusing the connection. Connection refusal means that you were able to contact the server, but the server did not permit the communication on the specified port. The first step in troubleshooting this type of problem is to verify that the Puppet Server is listening for connections on the port. The lsof
utility can do this for you, as shown in the following code:
My Puppet Server is running the java
process because puppetserver
runs inside a JVM. We see java
as the process name in the lsof
output. If you do not see any output here, then you will know that your Puppet Server is not listening on the 8140 port.
If you do see a line with the LISTEN
text, then your Puppet Server is listening and a firewall is blocking the communication. Host-based firewalls on Linux are configured with the firewalld
system or iptables
, depending on your distribution. More information on these two systems can be found at http://en.wikipedia.org/wiki/Iptables and https://fedoraproject.org/wiki/FirewallD.
Tip
Ubuntu distributions also include an Uncomplicated Firewall (ufw
) utility to configure iptables. BSD-based systems will use the Berkeley Packet Filter (pf
) or IPFilter. Knowing how to configure your host-based firewall configuration is a key troubleshooting skill.
If you are familiar with firewall configuration, you can add port 8140 to the allow list and solve the problem. If you are new to firewall configuration, you may choose to temporarily disable the firewall to aid your troubleshooting. Although a perimeter firewall is often a better solution, host-based firewalls should be used wherever possible to avoid accidentally or unintentionally exposing ports on your servers. When you have fixed the problem, turn the host-based firewall back on. On an Enterprise Linux-based distribution, the following will disable the host-based firewall:
If removing your host-based firewall does not solve your communication issue and you have verified that the service is listening on the correct port, then you will have to resort to advanced network troubleshooting tools.
Tools that may help in this case are mtr
and traceroute
. It is important to note that, even if a ping test fails, you may still be able to reach your Puppet Server on the masterport
. The ping utility uses ICMP packets, which may be blocked or restricted on your network. If the netcat
test still fails after addressing the firewall concerns, then you should try the mtr
utility to check whether you can find where your communication is not reaching the server. For example, to test connectivity with the puppet
server, issue the following command:
As an example, from my laptop, the following is the mtr
output when attempting to reach https://puppetlabs.com/:
If you were unable to reach the Puppet Server, the last line in the host list would be ???
. The line immediately preceding the ???
line would be the point at which the line of communication between the node and master was broken.
After you have verified that the network communication between the node and master is working as expected, the next issue that you should resolve is certificates.
Puppet uses X509 certificates to secure the communication between nodes and the master. As a Puppet administrator, you should know how the SSL certificates and a CA works.
Your infrastructure may have a separate server that acts as a CA for your Puppet installation. The CA is the certificate that is used to sign all the certificates that are generated by your master(s). If your CA is a separate server, the ca_server
option will be specified in the puppet.conf
file.
Although the server may be specified from the command line when running puppet agent
, the ca_server
option cannot.
By default, the CA certificate is generated on the first run of either the Puppet master or puppetserver
. The certificate is stored in /var/lib/puppet/ssl/ca/ca_crt.pem
for the
Open Source Puppet (OSS) or /etc/puppetlabs/puppet/ssl/ca/ca_crt.pem
for
Puppet Enterprise (PE). To view the information in the certificate, use OpenSSL's x509
utility, as follows:
If you are new to the openssl
command-line utility, try running openssl help
(help
is not actually an option, but it will cause the openssl
command to print helpful information). Each of the subcommands to the openssl
utility has its own Unix manual page. The manual page for the x509
subcommand can be found using man x509
.
The preceding information shows that the CA certificate was automatically generated and has a five-year expiry. 5 years has been the default expiry time for some time now, and many Puppet installations are nearly 5 years old and require the generation of new CA certificates. If everything suddenly stopped working, you may wish to verify the expiry date of your CA. In addition to the expiry time, we can see the subject of the certificate, puppet.example.com
. This is the name that Puppet has given to the CA based on the hostname
and domain
facts when the master/Puppet Server was started.
If you are diagnosing a certificate issue, you can first start by downloading the CA certificate. This can be done with the curl
or wget
utilities. In this example, we will use curl
and pass the --insecure
option to curl
(since we have not downloaded the CA yet and cannot verify the certificate at this point), as follows:
We can use a pipe (|
) to direct the curl
output to openssl
and verify the certificate, as follows:
If the CA certificate verifies correctly, the next step is to attempt to retrieve the certificate for your node. You can do this by first downloading the CA certificate to a local file as follows:
In this example, my hostname is mylaptop
. I will attempt to download my certificate from the master using curl
(verifying the communication with the previously downloaded CA certificate):
As you can see, this succeeded. If we pipe the output to OpenSSL, we see that the subject of the certificate is mylaptop
and the certificate has not expired:
Since we previously downloaded the CA certificate, we can also verify this certificate by using the verify
subcommand. To use verify
, we will give the path to the CA certificate that was previously downloaded, and the client certificate that we just downloaded, as follows:
If your master failed to return a certificate in the previous step, use puppet cert
on the master to find the certificate. For the mylaptop
example, issue the following commands:
If the certificate is present but unsigned, the output will have a missing +
symbol at the beginning, like this:
If the certificate is not present, the output will look like this:
A common problem with certificates is an old certificate or a mismatch between the ca_server
/master
and the node
. The simplest solution to this sort of problem is to remove the certificate from both machines and start again.
To remove the certificate on the ca_server
, use puppet cert clean
with the appropriate hostname, as follows:
As mentioned in the output, the certificates are stored in the subdirectories of /var/lib/puppet/ssl
. If the puppet cert clean
command does not remove the certificate, you can remove the files manually from this location.
On the node, remove private_key
and certificate
from the /var/lib/puppet/ssl
directory manually (there is no automatic way to do this). Alternatively, you can choose to remove the entire /var/lib/puppet/ssl
directory and have the node download the CA certificate again.
This location is different for Puppet Enterprise. Puppet Enterprise stores certificates in /etc/puppetlabs/puppet/ssl
. This often involves less work as compared to that of finding all the files that need to be removed.
When we ran puppet cert clean
on the master, one of the output lines mentioned that the certificate has been revoked. X509 certificates can be revoked. The list of certificates that have been revoked is kept in the
Certificate Revocation List (CRL), which is in the ca_crl.pem
file in /var/lib/puppet/ssl/ca
.
We can use OpenSSL's crl
utility to inspect the CRL, as follows:
As you can see, the certificate with the serial number 6 has been marked as revoked. The serial number is located within the certificate. When the master verifies a client, it will consult the CRL to verify that the serial number is not in the CRL.
More information on X509 certificates can be found at https://www.ietf.org/rfc/rfc2459.txt and http://en.wikipedia.org/wiki/X.509.