Chapter 13. Zero-Touch Provisioning

When a fixed-configuration Arista switch boots and no startup-config is found, the switch defaults to Zero-Touch Provisioning (ZTP) mode. Your first reaction to ZTP might be that it’s a pain in the ass, but I assure you it’s not, and I hope that by the end of this chapter, you’ll agree. In fact, it’s a seriously cool feature that you can use to great effect.

Note

Technically what triggers ZTP (when it’s enabled) is a missing startup-config or one that is zero bytes in size.

Have you ever installed a new switch out of the box? Chances are that you mounted it and sat in the data center with a console cable, or you sat with it on your desk while your workmates plotted against you because of all the fan noise. Or consider the idea of remote installations. I’ve had many clients who have bought remote “smart” hands service, only to discover that those remote hands weren’t so smart after all. ZTP is designed to provide the ability to eliminate both situations, all through the use of standards-based Dynamic Host Configuration Protocol (DHCP).

The reason I say that it can be a pain is because when a new switch is powered up and steps to use ZTP have not been taken, configuring the switch is next to impossible. Here’s the first indication that you’re in for a long day if you have no idea how Arista switches behave. When you press Enter at the login prompt of a new switch, you’re greeted with the following warning:

localhost login:

No startup-config was found.

The device is in Zero Touch Provisioning mode and is attempting to
download the startup-config from a remote system. The device will not
be fully functional until either a valid startup-config is downloaded
from a remote system or Zero Touch Provisioning is cancelled.

To cancel Zero Touch Provisioning, login as admin and type
'zerotouch cancel' at the CLI. Alternatively, to disable Zero Touch
Provisioning permanently, type 'zerotouch disable' at the CLI.
Note: The device will reload when these commands are issued.

Now, if you’re a high-level know-it-all network guy like me, you glossed over that because there were so many words. Allow me to point out the important bits in italic:

The device will not be fully functional until either a valid startup-config is downloaded from a remote system or Zero Touch Provisioning is cancelled. Yeah, that looks important—especially the part about the device not being fully functional. Still didn’t read? Allow me to demonstrate. Ignoring all dire warnings of impotent switches, I’ve logged in and started configuring. First, the hostname; it will need to be something special:

localhost#conf
localhost(config)#hostname Arista
Arista(config)#

Cool! Now that my switch has a clever hostname, I save the configuration—because 30 years of bad habits are hard to undo:

Arista(config)#wri
Note

As of EOS version 4.21.1F, when you try to do this, EOS will throw an error stating, Cannot copy to startup-config when ZeroTouch is enabled, which is another example of the developers quickly working to make all of my writing obsolete. In this case, I must give them credit because I think this is a great change.

Life is good, so I take a well-deserved break for my hourly dose of caffeine and sugar, after which I come back and press Return.

Again, 30 years of bad habits force me to perform actions beyond my control, so without thinking I mash the Return key a couple of times:

Arista(config)#
localhost(config)#
localhost(config)#

What the hell? I configured the switch, even saved the configuration, and the switch reverted to a generic hostname on its own! Stupid new switches that...wait a minute. What was that error message I saw earlier? Wasn’t there something about cancelling something?

Here’s the part:

To cancel Zero Touch Provisioning, login as admin and type
'zerotouch cancel' at the CLI. Alternatively, to disable Zero Touch
Provisioning permanently, type 'zerotouch disable' at the CLI.
Note: The device will reload when these commands are issued.

This little comedy of errors I just walked you through is almost a word-for-word account of my first interaction with an Arista switch. Because I’m a man, and a high-level networking know-it-all, I found no reason to read the documentation, let alone the warning message. Let’s cut to the chase, and I’ll show you what’s really going on.

Note

Over years of teaching and helping networking engineers ranging from absolute newcomers to triple CCIEs, I’ve come to the conclusion that networking engineers are genetically predisposed to being incapable of reading warning message on screens.

When an Arista switch boots for the first time, ZTP is enabled. The switch sees that there is no startup-config so it configures all interfaces with the no switchport command and then sends out DHCP queries on any Ethernet and management interfaces that are connected. If you’re patient (I’m not) and take the time to read the messages (I didn’t), you’ll see exactly what’s going on. Here’s a sample from a switch in such a state:

Apr 10 23:17:19 localhost ZeroTouch: %ZTP-5-DHCP_QUERY: Sending DHCP
request on  [ Ethernet10, Ethernet11, Ethernet24 ]
Apr 10 23:18:19 localhost ZeroTouch: %ZTP-5-DHCP_QUERY_FAIL: Failed
to get a valid DHCP response
Apr 10 23:18:19 localhost ZeroTouch: %ZTP-5-RETRY: Retrying Zero Touch
Provisioning from the begining (attempt 2)

Different revs of code might have different output. On my 7280R running EOS 4.21.1F, I see this output as the switch tries to get the interfaces to a default condition:

localhost(config)#Jan  7 20:15:34 localhost ConfigAgent: %ZTP-6-RETRY: Retrying
 Zero Touch Provisioning from the beginning (attempt 1)
Jan  7 20:15:34 localhost ConfigAgent: %ZTP-6-INTERFACE_SPEED: Setting interface
 Speed to default speed for [Ethernet26, Management1, Ethernet27, Ethernet7,
 Ethernet34, Ethernet8, Ethernet36, Ethernet46, Ethernet35, Ethernet22,
 Ethernet45, Ethernet12, Ethernet2, Ethernet42, Ethernet20, Ethernet18,
 Ethernet29, Ethernet17, Ethernet9, Ethernet41, Ethernet30, Ethernet40,
 Ethernet38, Ethernet16, Ethernet37, Ethernet4, Ethernet43, Ethernet1,
 Ethernet32, Ethernet23, Ethernet15, Ethernet3, Ethernet5, Ethernet24,
 Ethernet14, Ethernet11, Ethernet19, Ethernet39, Ethernet6, Ethernet48,
 Ethernet31, Ethernet47, Ethernet13, Ethernet10, Ethernet44, Ethernet21,
 Ethernet28, Ethernet25, Ethernet33]
Jan  7 20:15:34 localhost ConfigAgent: %ZTP-6-INTERFACE_SPEED: Setting interface
 speed to forced 10gfull for [Ethernet52/1, Ethernet49/3, Ethernet51/3,
 Ethernet54/3, Ethernet50/2, Ethernet51/1, Ethernet50/3, Ethernet49/1,
 Ethernet51/4, Ethernet52/3, Ethernet53/1, Ethernet53/4, Ethernet50/4,
 Ethernet50/1, Ethernet51/2, Ethernet52/2, Ethernet53/2, Ethernet49/4,
 Ethernet52/4, Ethernet54/4, Ethernet49/2, Ethernet54/2, Ethernet53/3,
 Ethernet54/1]

The switch found three connected interfaces (E10, E11, and E24) and sent DHCP queries out to all of them. Sadly, there were no responses, so it kept trying. If you leave it in this state, it will try forever. If you configure the switch, that configuration will be trashed as ZTP tries again and again to find a configuration, or at least an IP address.

At this point, there are four things you can do:

  • Cancel ZTP

  • Disable ZTP

  • Actually boot using ZTP to your advantage

  • Give up, quit your job, move to Texas, and grow sugar beets

I suppose we could also turn off the switch and hit the bar for some cocktails, but let’s keep this discussion focused on ZTP and examine each choice. Well, maybe not the one about the sugar beets.

ZTP Requirements

For ZTP to work, you need the following:

  • A DHCP server

  • A server to host the configuration or script to be fetched (can be the same server)

  • Connected interfaces on the switch

This feature works by sending a DHCP request out of every connected interface. When an IP address is supplied by DHCP, it’s assigned to the interface on which the reply was received. The DHCP server must support the ability to provide a URL pointing to a file (which most modern DHCP servers do), which the switch will then retrieve. For now, let’s assume that the file being retrieved is a configuration file. We talk about scripts a bit later in this chapter.

Cancelling ZTP

Cancelling ZTP is pretty simple. Just log in, enter configuration mode, and issue the zerotouch cancel command:

localhost(config)#conf
Jan  7 20:16:04 localhost ConfigAgent: %ZTP-6-RETRY: Retrying Zero Touch
 Provisioning from the beginning (attempt 2)
localhost(config)#zerotouch cancel
Jan  7 20:16:08 localhost ConfigAgent: %ZTP-6-CANCEL: Cancelling Zero Touch
 Provisioning
Jan  7 20:16:08 localhFlushing AAA accounting queue: [  OK  ]

Restarting system [20:16:10]

Notice that there is no “Are you sure?” prompt. If you issue this command, the switch will reload. That’s not such a bad thing, because anything we tried to configure was wiped out anyway. Besides, those ZTP messages get old quickly.

When the switch reboots, you’ll be able to configure it as you would expect. Here’s the rub: if you cancel ZTP, it’s still there; it’s just not bothering you this time. In fact, the configuration will still have the login banner programmed with the ZTP warning message.

If, with ZTP cancelled, you erase the startup-config and reboot, you will again be treated to ZTP’s never-ending messages and attempts at finding a DHCP server:

May 14 05:44:59 localhost ZeroTouch: %ZTP-5-INIT: No startup-config
found, starting Zero Touch Provisioning
May 14 05:45:34 localhost ZeroTouch: %ZTP-5-DHCP_QUERY: Sending DHCP
request on  [ Ethernet10, Ethernet11, Ethernet24 ]

The way I like to describe zerotouch cancel is that it disables ZTP from taking over for only one reboot.

Suppose that for some reason, no matter what, you don’t want to see ZTP’s ugly face again. For that to happen, you must completely disable ZTP.

Disabling ZTP

To completely disable ZTP, log in as admin and issue the zerotouch disable command. The switch will immediately reboot (again, without warning or confirmation), only this time, ZTP is terminated with extreme prejudice as evidenced by the following messages:

localhost#zerotouch disable
May 14 05:47:42 localhost ZeroTouch: %ZTP-5-CANCEL: Cancelling Zero Touch
 Provisioning
May 14 05:47:42 localhost ZeroTouch: %ZTP-5-RELOAD: Rebooting the system
May 14 05:47:43 localhost ProcMgr-worker: %PROCMGR-6-WORKER_WARMSTART:
ProcMgr worker warm start. (PID=1446)
May 14 05:47:43 localhost ProcMgr-worker:
%PROCMGR-6-TERMINATE_RUNNING_PROCESS: Terminating
deconfigured/reconfigured process 'ZeroTouch' (PID=1506)
May 14 05:47:44 localhost ProcMgr-worker:
%PROCMGR-6-PROCESS_TERMINATED: 'ZeroTouch' (PID=1506) has terminated.
May 14 05:47:44 localhost ProcMgr-worker:
%PROCMGR-6-PROCESS_NOTRESTARTING: Letting 'ZeroTouch' (PID=1506)
exit - NOT restarting it.

Now, the system boots once more, and there isn’t even a trace of ZTP:

Aboot 6.1.2-4757975


Press Control-C now to enter Aboot shell
Booting flash:/EOS-4.21.1F.swi
[    8.431535] Starting new kernel
[    1.651605] Running dosfsck on: /mnt/flash
Switching rootfs

Welcome to Arista Networks EOS 4.21.1F
New seat seat0.
RTNETLINK answers: No such process
[   71.304306] EXT4-fs (sda): VFS: Can't find ext4 filesystem
[   71.379014] EXT4-fs (sda): VFS: Can't find ext4 filesystem
[   71.457489] FAT-fs (sda): invalid media value (0xf3)
Arista File Archive: initialization complete, quotapct: 20
warning: file /usr/share/doc/strongswan-5.3.0/NEWS: remove failed: No such
file or directory
[  OK  ] ConnMgr: Starting TimeAgent: [  OK  ]
[  OK  ]
Starting ProcMgr: [  OK  ]
Starting EOS initialization stage 1: [  OK  ]
Starting NorCal initialization: [  OK  ]
Starting EOS initialization stage 2: [  OK  ]
Starting Power OCompleting EOS initialization (press ESC to skip): [  OK  ]
Model: DCS-7280SR-48C6-M
Serial Number: SSJ17290598
System RAM: 32458980 kB
Flash Memory size:  3.1G

localhost login:

At this point you can reboot the switch to your heart’s content, and ZTP will not come back. If you’d like to bring ZTP back, you can’t even do it from exec mode:

localhost#zerotouch ?
  cancel               Cancel ZeroTouch and reload the switch
  disable              Disable ZeroTouch and reload the switch
  script-exec-timeout  Change timeout for the downloaded script to
                       finish execution

In fact, you can’t even do it in config mode, though there’s an option for enable, which gives us a clue as to what we need to do:

localhost#conf
localhost(config)#zerotouch enable
% Configuration ignored: ZeroTouch can not be enabled interactively.
To enable ZeroTouch, delete startup-config and reload the switch.

Sadly, that message is not entirely correct—at least up to EOS version 4.21.1F. Unless you accept the fact that nothing is impossible with an Arista switch! Check out what’s on the flash: drive:

localhost#dir
Directory of flash:/

       -rwx   700978970           Nov 12  2018  EOS-4.21.1F.swi
       -rwx   701781780           Nov 12  2018  EOS-4.21.2F.swi
       -rwx          77            Jan 7 20:44  SsuRestore.log
       -rwx          27            Jan 4 17:04  boot-config
       drwx        4096            Jan 7 20:46  debug
       drwx        4096            Jan 7 20:46  persist
       drwx        4096            Jan 2 22:09  schedule
       -rwx           0            Jan 7 20:33  startup-config
       -rwx          13            Jan 7 20:42  zerotouch-config

3269361664 bytes total (1157074944 bytes free)

Hmm. A file named zerotouch-config exists. Let’s dig into that file and see what we find, shall we?

localhost#more flash:zerotouch-config
DISABLE=True

Ha! I’ll bet if we change that to false, we’ll get ZTP back. In fact, in the first edition of Arista Warrior, I did just that, but years of maturity have set in, and I now recommend that people just delete the file, which has the same effect and doesn’t get your hands dirty with vi:

localhost#del flash:zerotouch-config
localhost#

Because the startup-config is zero bytes, I don’t need to write erase, so I’ll force a reboot to see what happens:

localhost(config)#reload now

Broadcast message from root@localhost (Mon Jan  7 20:51:37 2019):

The system is going down for reboot NOW!
Flushing AAA accounting queue: [  OK  ]

Restarting system
[--- lots of output truncated ---]
May 14 19:42:54 localhost ZeroTouch: %ZTP-5-INIT: No startup-config found,
 starting Zero Touch Provisioning

This was actually pretty cool for me when I first figured it out (back around 2012 or so) because I could find no mention of how to do this in the documentation, and with my experience using Arista switches I reasoned that there had to be a configuration file somewhere and that Arista would never permanently disable a feature. Following that logic, I found the configuration file, changed it, and was rewarded with the expected results. Today I have access to all sorts of nifty Arista documentation both internal and external, but I can’t show you any of that secret stuff.

I feel that it’s important to point out that ZTP is not controlled in any way by the startup-config, and if you think about it, it cannot be. Why? Because the feature is triggered by the lack of the file, so how could it be configured using a file that needs to not be there for it to work? That’s why ZTP is configured with a file on flash:.

If you’re wondering how zerotouch cancel works, when the switch reboots, it puts the string disable=once or disable=nextReload into the boot-config, depending on the version of EOS. This lets the switch come up normally, and when it does, one of the first things it does is delete that file so that the next time it boots, ZTP is triggered once more assuming no startup-config.

Booting with ZTP

Hopefully you’ve considered the possibility of booting with ZTP, because as we’re about to see, it’s pretty darn cool. As with all things Arista, it’s well thought out and powerful.

The first thing we need to know is that while ZTP can load a configuration from the network, it can also run scripts. Before we get to that, let’s take a look at what needs to happen in order for an Arista switch to boot using ZTP.

When a switch boots in ZTP mode, it configures all of the Ethernet and Management interfaces with the no switchport command in order to allow DHCP to be run on those interfaces. DHCP is capable of much more than just providing an IP address, and ZTP takes advantage of this fact. To see how, let’s first take a look at how to configure a DHCP server to use ZTP.

I’ve set up a lab in which I have an Ubuntu Linux server, an existing network, and a new Arista switch. I’m using dhcpd on the Linux server to serve DHCP. The examples used in this section reflect the configuration required for dhcpd, though the concepts should be universal for any DHCP server that supports them.

To begin, I configure the regular DHCP parameters as follows to fit with my network:

subnet 10.0.0.0 netmask 255.255.255.0 {
  option subnet-mask 255.255.255.0;
  option broadcast-address 10.0.0.255;
  option domain-name-servers 10.0.0.100;
  option domain-name "gad.net";
  option bootfile-name "http://10.0.0.100/ZTP/ZTP-Script";
  pool {
     range 10.0.0.30 10.0.0.50;
  }
  }

Given this configuration, dynamically allocated IP addresses will be served from the pool 10.0.0.30–50. This configuration will deliver an IP address from the pool to any device that requests an IP address. Lastly, the option bootfile-name line shows an HTTP URL. The switch, upon receiving and activating the IP address, will go to this URL and download the bootfile. Here’s where this gets fun.

This file can be either a startup-config, in which case the switch will apply the configuration and reboot, or it can be a script. ZTP will decide based on the first line of the file found at the bootfile-name address. If the first line includes a shebang (#!) and the path to an interpreter, that interpreter will be loaded and the file will be considered a script. If not, the file will be considered a startup-config and treated accordingly.

Let’s look at an example that might apply for the real world. Here’s how I’ve configured my ZTP-Script file on the web server:

[root@cozy ZTP]$ more ZTP-Script
#!/usr/bin/Cli
enable
copy http://www2.gad.net/Arista/Arista1-startup flash:startup-config
copy http://www2.gad.net/Arista/EOS-4.21.1F.swi flash:
config
boot system flash:EOS-4.21.1F.swi

This file will perform the following actions on the switch, in order, after ZTP loads it:

  • Load the CLI process

  • Enter EOS CLI enable mode

  • Copy the startup-config from the web server to flash:

  • Copy the desired revision of EOS from the web server to flash:

  • Enter EOS CLI configuration mode

  • Set the boot-config to boot from the new EOS version just downloaded

Note

The script does not have a wr mem or copy run start at the end of it. Does it need one? Is that a typo? Do I really have no idea what I’m doing as has been postulated by those who think I’m wrong? If you think that there needs to be a wr at the end of that script, I’d advise two things: 1) read the script again, and 2) read the next chapter, which happens to be about Aboot.

Would a wr at the end of the script hurt anything? Nope! But I like to get people to think about how things work.

After the script has completed, the switch will reload. Let’s take a look at the switch as it boots with the ZTP configuration loaded on the DHCP server.

Here, my switch is running EOS version 4.19.1F and has no configuration:

localhost#sho ver
Arista DCS-7280SE-72-F
Hardware version:    01.00
Serial number:       JPE15181350
System MAC address:  001c.7390.93cf
Software image version: 4.19.1F
Architecture:           i386
Internal build version: 4.19.1F-3205780.4166M
Internal build ID:      373dbd3c-60a7-4736-8d9e-bf5e7d207689

Uptime:                 3 minutes
Total memory:           3844356 kB
Free memory:            470840 kB

And the existing boot configuration is pointing to 4.19.1F, as well:

localhost#sho boot-config
Software image: flash:/EOS-4.19.1F.swi
Console speed: (not set)
Aboot password (encrypted): (not set)
Memory test iterations: (not set)

Now I erase the startup-config with the write erase command:

localhost#write erase
Proceed with erasing startup configuration? [confirm]
localhost#

And reload with extreme prejudice. Or something:

Arista#reload now

Broadcast message from root@localhost
        (unknown) at 5:13 ...

The system is going down for reboot NOW!

Now let’s watch the ZTP messages as the switch boots (I’ve included only the relevant message here to save space):

Address: 10.0.0.45/24/24; Nameserver: 10.0.0.100; Domain: gad.net;
  Boot File: http://10.0.0.100/ZTP/ZTP-Script ]
Jan  7 21:05:23 localhost ConfigAgent: %ZTP-6-CONFIG_DOWNLOAD: Attempting to
download the startup-config from http://10.0.0.100/ZTP/ZTP-Script
Jan  7 21:05:23 localhost ConfigAgent: %ZTP-6-CONFIG_DOWNLOAD_SUCCESS:
Successfully downloaded config script from http://10.0.0.100/ZTP/ZTP-Script
Jan  7 21:05:23 localhost ConfigAgent: %ZTP-6-EXEC_SCRIPT: Executing the
downloaded config script
Jan  7 21:05:24 localhost ZTP: : Interface found:  1
Jan  7 21:05:24 localhost ConfigAgent: %ZTP-6-EXEC_SCRIPT_SUCCESS: Successfully
executed the downloaded config script
Jan  7 21:05:24 localhost ConfigAgent: %ZTP-6-RELOAD: Rebooting the system
Flushing AAA accounting queue: [  OK  ]

At this point the switch has downloaded the ZTP file, applied what it found there, and reloaded. So, without direct interaction, the switch booted, ran our script, loaded the new configuration, downloaded new code, rebooted, and applied the new code.

ZTP-Switch#sho ver
Arista DCS-7280SR-48C6-M-F
Hardware version:    21.05
Serial number:       SSJ17290598
System MAC address:  2899.3abe.9f92

Software image version: 4.21.1F
Architecture:           i386
Internal build version: 4.21.1F-9887494.4211F
Internal build ID:      1497e24b-a79b-48e7-a876-43061e109b92

Uptime:                 0 weeks, 0 days, 0 hours and 10 minutes
Total memory:           32458980 kB
Free memory:            30521916 kB

The hostname ZTP-switch came from the copied startup-config, and the switch is now running on a new revision of code.

Imagine a scenario in which you’re installing 50 new Arista switches. This procedure could put a base configuration on them all while also loading the right version of code. That’s my kind of time-saving feature! Oh, and if you’re using CloudVision, this is exactly how that tool configures new switches! CloudVision has a ZTP server built in that handles all of this for you. Note that you need to enable the DHCP server on CloudVision if you want to use it, and this is because you can also use your own DHCP server if, for example, you already have one in place.

If you don’t like writing your own code and you don’t have CloudVision, there’s another option (Arista loves options!) called ztpserver that you can get from the EOS Plus team at its GitHub page.

Not only that, but some customers have developed scripts that autoarchive the configurations on all of their switches every day (which is always a good idea) and then update the ZTP files accordingly. When a switch fails, ZTP on the replacement switch references the proper files that have been scripted to contain the configuration of the switch being replaced. Think about the time saved during an outage when the switch configures itself after a replacement.

Conclusion

In my training labs at Arista, I’ve written a ZTP script that allows the switches to configure themselves based on their position within the network. How? By issuing a show lldp neighbor (Chapter 11) command that allows the switch to know what interface it’s connected to on the management switch. If the switch is connected to interface e5 on the management switch, it configured itself as student-05. You can see this script on my GitHub page. The ability to have the switch run a script in order to configure itself is a fabulous capability, but remember that if you don’t like to code, you can use ZTP server or CloudVision (Chapter 15) to accomplish something similar.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset