When a fixed-configuration Arista switch boots and no startup-config is found, the switch defaults to Zero-Touch Provisioning (ZTP) mode. Your first reaction to ZTP might be that it’s a pain in the ass, but I assure you it’s not, and I hope that by the end of this chapter, you’ll agree. In fact, it’s a seriously cool feature that you can use to great effect.
Technically what triggers ZTP (when it’s enabled) is a missing startup-config or one that is zero bytes in size.
Have you ever installed a new switch out of the box? Chances are that you mounted it and sat in the data center with a console cable, or you sat with it on your desk while your workmates plotted against you because of all the fan noise. Or consider the idea of remote installations. I’ve had many clients who have bought remote “smart” hands service, only to discover that those remote hands weren’t so smart after all. ZTP is designed to provide the ability to eliminate both situations, all through the use of standards-based Dynamic Host Configuration Protocol (DHCP).
The reason I say that it can be a pain is because when a new switch is powered up and steps to use ZTP have not been taken, configuring the switch is next to impossible. Here’s the first indication that you’re in for a long day if you have no idea how Arista switches behave. When you press Enter at the login prompt of a new switch, you’re greeted with the following warning:
localhost login: No startup-config was found. The device is in Zero Touch Provisioning mode and is attempting to download the startup-config from a remote system. The device will not be fully functional until either a valid startup-config is downloaded from a remote system or Zero Touch Provisioning is cancelled. To cancel Zero Touch Provisioning, login as admin and type 'zerotouch cancel' at the CLI. Alternatively, to disable Zero Touch Provisioning permanently, type 'zerotouch disable' at the CLI. Note: The device will reload when these commands are issued.
Now, if you’re a high-level know-it-all network guy like me, you glossed over that because there were so many words. Allow me to point out the important bits in italic:
The device will not be fully functional until either a valid startup-config is downloaded from a remote system or Zero Touch Provisioning is cancelled. Yeah, that looks important—especially the part about the device not being fully functional. Still didn’t read? Allow me to demonstrate. Ignoring all dire warnings of impotent switches, I’ve logged in and started configuring. First, the hostname; it will need to be something special:
localhost#conf localhost(config)#hostname Arista Arista(config)#
Cool! Now that my switch has a clever hostname, I save the configuration—because 30 years of bad habits are hard to undo:
Arista(config)#wri
As of EOS version 4.21.1F, when you try to do this, EOS will throw an error stating, Cannot copy to startup-config when ZeroTouch is enabled, which is another example of the developers quickly working to make all of my writing obsolete. In this case, I must give them credit because I think this is a great change.
Life is good, so I take a well-deserved break for my hourly dose of caffeine and sugar, after which I come back and press Return.
Again, 30 years of bad habits force me to perform actions beyond my control, so without thinking I mash the Return key a couple of times:
Arista(config)# localhost(config)# localhost(config)#
What the hell? I configured the switch, even saved the configuration, and the switch reverted to a generic hostname on its own! Stupid new switches that...wait a minute. What was that error message I saw earlier? Wasn’t there something about cancelling something?
Here’s the part:
To cancel Zero Touch Provisioning, login as admin and type 'zerotouch cancel' at the CLI. Alternatively, to disable Zero Touch Provisioning permanently, type 'zerotouch disable' at the CLI. Note: The device will reload when these commands are issued.
This little comedy of errors I just walked you through is almost a word-for-word account of my first interaction with an Arista switch. Because I’m a man, and a high-level networking know-it-all, I found no reason to read the documentation, let alone the warning message. Let’s cut to the chase, and I’ll show you what’s really going on.
Over years of teaching and helping networking engineers ranging from absolute newcomers to triple CCIEs, I’ve come to the conclusion that networking engineers are genetically predisposed to being incapable of reading warning message on screens.
When an Arista switch boots for the first time, ZTP is enabled. The switch sees that there is no startup-config so it configures all interfaces with the no switchport
command and then sends out DHCP queries on any Ethernet and management interfaces that are connected. If you’re patient (I’m not) and take the time to read the messages (I didn’t), you’ll see exactly what’s going on. Here’s a sample from a switch in such a state:
Apr 10 23:17:19 localhost ZeroTouch: %ZTP-5-DHCP_QUERY: Sending DHCP request on [ Ethernet10, Ethernet11, Ethernet24 ] Apr 10 23:18:19 localhost ZeroTouch: %ZTP-5-DHCP_QUERY_FAIL: Failed to get a valid DHCP response Apr 10 23:18:19 localhost ZeroTouch: %ZTP-5-RETRY: Retrying Zero Touch Provisioning from the begining (attempt 2)
Different revs of code might have different output. On my 7280R running EOS 4.21.1F, I see this output as the switch tries to get the interfaces to a default condition:
localhost(config)#Jan 7 20:15:34 localhost ConfigAgent: %ZTP-6-RETRY: Retrying Zero Touch Provisioning from the beginning (attempt 1) Jan 7 20:15:34 localhost ConfigAgent: %ZTP-6-INTERFACE_SPEED: Setting interface Speed to default speed for [Ethernet26, Management1, Ethernet27, Ethernet7, Ethernet34, Ethernet8, Ethernet36, Ethernet46, Ethernet35, Ethernet22, Ethernet45, Ethernet12, Ethernet2, Ethernet42, Ethernet20, Ethernet18, Ethernet29, Ethernet17, Ethernet9, Ethernet41, Ethernet30, Ethernet40, Ethernet38, Ethernet16, Ethernet37, Ethernet4, Ethernet43, Ethernet1, Ethernet32, Ethernet23, Ethernet15, Ethernet3, Ethernet5, Ethernet24, Ethernet14, Ethernet11, Ethernet19, Ethernet39, Ethernet6, Ethernet48, Ethernet31, Ethernet47, Ethernet13, Ethernet10, Ethernet44, Ethernet21, Ethernet28, Ethernet25, Ethernet33] Jan 7 20:15:34 localhost ConfigAgent: %ZTP-6-INTERFACE_SPEED: Setting interface speed to forced 10gfull for [Ethernet52/1, Ethernet49/3, Ethernet51/3, Ethernet54/3, Ethernet50/2, Ethernet51/1, Ethernet50/3, Ethernet49/1, Ethernet51/4, Ethernet52/3, Ethernet53/1, Ethernet53/4, Ethernet50/4, Ethernet50/1, Ethernet51/2, Ethernet52/2, Ethernet53/2, Ethernet49/4, Ethernet52/4, Ethernet54/4, Ethernet49/2, Ethernet54/2, Ethernet53/3, Ethernet54/1]
The switch found three connected interfaces (E10, E11, and E24) and sent DHCP queries out to all of them. Sadly, there were no responses, so it kept trying. If you leave it in this state, it will try forever. If you configure the switch, that configuration will be trashed as ZTP tries again and again to find a configuration, or at least an IP address.
At this point, there are four things you can do:
Cancel ZTP
Disable ZTP
Actually boot using ZTP to your advantage
Give up, quit your job, move to Texas, and grow sugar beets
I suppose we could also turn off the switch and hit the bar for some cocktails, but let’s keep this discussion focused on ZTP and examine each choice. Well, maybe not the one about the sugar beets.
For ZTP to work, you need the following:
A DHCP server
A server to host the configuration or script to be fetched (can be the same server)
Connected interfaces on the switch
This feature works by sending a DHCP request out of every connected interface. When an IP address is supplied by DHCP, it’s assigned to the interface on which the reply was received. The DHCP server must support the ability to provide a URL pointing to a file (which most modern DHCP servers do), which the switch will then retrieve. For now, let’s assume that the file being retrieved is a configuration file. We talk about scripts a bit later in this chapter.
Cancelling ZTP is pretty simple. Just log in, enter configuration mode, and issue the zerotouch cancel
command:
localhost(config)#conf Jan 7 20:16:04 localhost ConfigAgent: %ZTP-6-RETRY: Retrying Zero Touch Provisioning from the beginning (attempt 2) localhost(config)#zerotouch cancel Jan 7 20:16:08 localhost ConfigAgent: %ZTP-6-CANCEL: Cancelling Zero Touch Provisioning Jan 7 20:16:08 localhFlushing AAA accounting queue: [ OK ] Restarting system [20:16:10]
Notice that there is no “Are you sure?” prompt. If you issue this command, the switch will reload. That’s not such a bad thing, because anything we tried to configure was wiped out anyway. Besides, those ZTP messages get old quickly.
When the switch reboots, you’ll be able to configure it as you would expect. Here’s the rub: if you cancel ZTP, it’s still there; it’s just not bothering you this time. In fact, the configuration will still have the login banner programmed with the ZTP warning message.
If, with ZTP cancelled, you erase the startup-config and reboot, you will again be treated to ZTP’s never-ending messages and attempts at finding a DHCP server:
May 14 05:44:59 localhost ZeroTouch: %ZTP-5-INIT: No startup-config found, starting Zero Touch Provisioning May 14 05:45:34 localhost ZeroTouch: %ZTP-5-DHCP_QUERY: Sending DHCP request on [ Ethernet10, Ethernet11, Ethernet24 ]
The way I like to describe zerotouch cancel
is that it disables ZTP from taking over for only one reboot.
Suppose that for some reason, no matter what, you don’t want to see ZTP’s ugly face again. For that to happen, you must completely disable ZTP.
To completely disable ZTP, log in as admin and issue the zerotouch disable
command. The switch will immediately reboot (again, without warning or confirmation), only this time, ZTP is terminated with extreme prejudice as evidenced by the following messages:
localhost#zerotouch disable May 14 05:47:42 localhost ZeroTouch: %ZTP-5-CANCEL: Cancelling Zero Touch Provisioning May 14 05:47:42 localhost ZeroTouch: %ZTP-5-RELOAD: Rebooting the system May 14 05:47:43 localhost ProcMgr-worker: %PROCMGR-6-WORKER_WARMSTART: ProcMgr worker warm start. (PID=1446) May 14 05:47:43 localhost ProcMgr-worker: %PROCMGR-6-TERMINATE_RUNNING_PROCESS: Terminating deconfigured/reconfigured process 'ZeroTouch' (PID=1506) May 14 05:47:44 localhost ProcMgr-worker: %PROCMGR-6-PROCESS_TERMINATED: 'ZeroTouch' (PID=1506) has terminated. May 14 05:47:44 localhost ProcMgr-worker: %PROCMGR-6-PROCESS_NOTRESTARTING: Letting 'ZeroTouch' (PID=1506) exit - NOT restarting it.
Now, the system boots once more, and there isn’t even a trace of ZTP:
Aboot 6.1.2-4757975 Press Control-C now to enter Aboot shell Booting flash:/EOS-4.21.1F.swi [ 8.431535] Starting new kernel [ 1.651605] Running dosfsck on: /mnt/flash Switching rootfs Welcome to Arista Networks EOS 4.21.1F New seat seat0. RTNETLINK answers: No such process [ 71.304306] EXT4-fs (sda): VFS: Can't find ext4 filesystem [ 71.379014] EXT4-fs (sda): VFS: Can't find ext4 filesystem [ 71.457489] FAT-fs (sda): invalid media value (0xf3) Arista File Archive: initialization complete, quotapct: 20 warning: file /usr/share/doc/strongswan-5.3.0/NEWS: remove failed: No such file or directory [ OK ] ConnMgr: Starting TimeAgent: [ OK ] [ OK ] Starting ProcMgr: [ OK ] Starting EOS initialization stage 1: [ OK ] Starting NorCal initialization: [ OK ] Starting EOS initialization stage 2: [ OK ] Starting Power OCompleting EOS initialization (press ESC to skip): [ OK ] Model: DCS-7280SR-48C6-M Serial Number: SSJ17290598 System RAM: 32458980 kB Flash Memory size: 3.1G localhost login:
At this point you can reboot the switch to your heart’s content, and ZTP will not come back. If you’d like to bring ZTP back, you can’t even do it from exec
mode:
localhost#zerotouch ? cancel Cancel ZeroTouch and reload the switch disable Disable ZeroTouch and reload the switch script-exec-timeout Change timeout for the downloaded script to finish execution
In fact, you can’t even do it in config
mode, though there’s an option for enable
, which gives us a clue as to what we need to do:
localhost#conf localhost(config)#zerotouch enable % Configuration ignored: ZeroTouch can not be enabled interactively. To enable ZeroTouch, delete startup-config and reload the switch.
Sadly, that message is not entirely correct—at least up to EOS version 4.21.1F. Unless you accept the fact that nothing is impossible with an Arista switch! Check out what’s on the flash: drive:
localhost#dir Directory of flash:/ -rwx 700978970 Nov 12 2018 EOS-4.21.1F.swi -rwx 701781780 Nov 12 2018 EOS-4.21.2F.swi -rwx 77 Jan 7 20:44 SsuRestore.log -rwx 27 Jan 4 17:04 boot-config drwx 4096 Jan 7 20:46 debug drwx 4096 Jan 7 20:46 persist drwx 4096 Jan 2 22:09 schedule -rwx 0 Jan 7 20:33 startup-config -rwx 13 Jan 7 20:42 zerotouch-config 3269361664 bytes total (1157074944 bytes free)
Hmm. A file named zerotouch-config exists. Let’s dig into that file and see what we find, shall we?
localhost#more flash:zerotouch-config DISABLE=True
Ha! I’ll bet if we change that to false, we’ll get ZTP back. In fact, in the first edition of Arista Warrior, I did just that, but years of maturity have set in, and I now recommend that people just delete the file, which has the same effect and doesn’t get your hands dirty with vi:
localhost#del flash:zerotouch-config localhost#
Because the startup-config is zero bytes, I don’t need to write erase
, so I’ll force a reboot to see what happens:
localhost(config)#reload now Broadcast message from root@localhost (Mon Jan 7 20:51:37 2019): The system is going down for reboot NOW! Flushing AAA accounting queue: [ OK ] Restarting system [--- lots of output truncated ---] May 14 19:42:54 localhost ZeroTouch: %ZTP-5-INIT: No startup-config found, starting Zero Touch Provisioning
This was actually pretty cool for me when I first figured it out (back around 2012 or so) because I could find no mention of how to do this in the documentation, and with my experience using Arista switches I reasoned that there had to be a configuration file somewhere and that Arista would never permanently disable a feature. Following that logic, I found the configuration file, changed it, and was rewarded with the expected results. Today I have access to all sorts of nifty Arista documentation both internal and external, but I can’t show you any of that secret stuff.
I feel that it’s important to point out that ZTP is not controlled in any way by the startup-config, and if you think about it, it cannot be. Why? Because the feature is triggered by the lack of the file, so how could it be configured using a file that needs to not be there for it to work? That’s why ZTP is configured with a file on flash:.
If you’re wondering how zerotouch cancel
works, when the switch reboots, it puts the string disable=once
or disable=nextReload
into the boot-config, depending on the version of EOS. This lets the switch come up normally, and when it does, one of the first things it does is delete that file so that the next time it boots, ZTP is triggered once more assuming no startup-config.
Hopefully you’ve considered the possibility of booting with ZTP, because as we’re about to see, it’s pretty darn cool. As with all things Arista, it’s well thought out and powerful.
The first thing we need to know is that while ZTP can load a configuration from the network, it can also run scripts. Before we get to that, let’s take a look at what needs to happen in order for an Arista switch to boot using ZTP.
When a switch boots in ZTP mode, it configures all of the Ethernet and Management interfaces with the no switchport
command in order to allow DHCP to be run on those interfaces. DHCP is capable of much more than just providing an IP address, and ZTP takes advantage of this fact. To see how, let’s first take a look at how to configure a DHCP server to use ZTP.
I’ve set up a lab in which I have an Ubuntu Linux server, an existing network, and a new Arista switch. I’m using dhcpd on the Linux server to serve DHCP. The examples used in this section reflect the configuration required for dhcpd, though the concepts should be universal for any DHCP server that supports them.
To begin, I configure the regular DHCP parameters as follows to fit with my network:
subnet 10.0.0.0 netmask 255.255.255.0 { option subnet-mask 255.255.255.0; option broadcast-address 10.0.0.255; option domain-name-servers 10.0.0.100; option domain-name "gad.net"; option bootfile-name "http://10.0.0.100/ZTP/ZTP-Script"; pool { range 10.0.0.30 10.0.0.50; } }
Given this configuration, dynamically allocated IP addresses will be served from the pool 10.0.0.30–50. This configuration will deliver an IP address from the pool to any device that requests an IP address. Lastly, the option bootfile-name
line shows an HTTP URL. The switch, upon receiving and activating the IP address, will go to this URL and download the bootfile. Here’s where this gets fun.
This file can be either a startup-config, in which case the switch will apply the configuration and reboot, or it can be a script. ZTP will decide based on the first line of the file found at the bootfile-name
address. If the first line includes a shebang (#!) and the path to an interpreter, that interpreter will be loaded and the file will be considered a script. If not, the file will be considered a startup-config and treated accordingly.
Let’s look at an example that might apply for the real world. Here’s how I’ve configured my ZTP-Script file on the web server:
[root@cozy ZTP]$ more ZTP-Script #!/usr/bin/Cli enable copy http://www2.gad.net/Arista/Arista1-startup flash:startup-config copy http://www2.gad.net/Arista/EOS-4.21.1F.swi flash: config boot system flash:EOS-4.21.1F.swi
This file will perform the following actions on the switch, in order, after ZTP loads it:
Load the CLI process
Enter EOS CLI enable mode
Copy the startup-config from the web server to flash:
Copy the desired revision of EOS from the web server to flash:
Enter EOS CLI configuration mode
Set the boot-config to boot from the new EOS version just downloaded
The script does not have a wr mem
or copy run start
at the end of it. Does it need one? Is that a typo? Do I really have no idea what I’m doing as has been postulated by those who think I’m wrong? If you think that there needs to be a wr
at the end of that script, I’d advise two things: 1) read the script again, and 2) read the next chapter, which happens to be about Aboot.
Would a wr
at the end of the script hurt anything? Nope! But I like to get people to think about how things work.
After the script has completed, the switch will reload. Let’s take a look at the switch as it boots with the ZTP configuration loaded on the DHCP server.
Here, my switch is running EOS version 4.19.1F and has no configuration:
localhost#sho ver Arista DCS-7280SE-72-F Hardware version: 01.00 Serial number: JPE15181350 System MAC address: 001c.7390.93cf Software image version: 4.19.1F Architecture: i386 Internal build version: 4.19.1F-3205780.4166M Internal build ID: 373dbd3c-60a7-4736-8d9e-bf5e7d207689 Uptime: 3 minutes Total memory: 3844356 kB Free memory: 470840 kB
And the existing boot configuration is pointing to 4.19.1F, as well:
localhost#sho boot-config Software image: flash:/EOS-4.19.1F.swi Console speed: (not set) Aboot password (encrypted): (not set) Memory test iterations: (not set)
Now I erase the startup-config with the write erase
command:
localhost#write erase Proceed with erasing startup configuration? [confirm] localhost#
And reload with extreme prejudice. Or something:
Arista#reload now Broadcast message from root@localhost (unknown) at 5:13 ... The system is going down for reboot NOW!
Now let’s watch the ZTP messages as the switch boots (I’ve included only the relevant message here to save space):
Address: 10.0.0.45/24/24; Nameserver: 10.0.0.100; Domain: gad.net; Boot File: http://10.0.0.100/ZTP/ZTP-Script ] Jan 7 21:05:23 localhost ConfigAgent: %ZTP-6-CONFIG_DOWNLOAD: Attempting to download the startup-config from http://10.0.0.100/ZTP/ZTP-Script Jan 7 21:05:23 localhost ConfigAgent: %ZTP-6-CONFIG_DOWNLOAD_SUCCESS: Successfully downloaded config script from http://10.0.0.100/ZTP/ZTP-Script Jan 7 21:05:23 localhost ConfigAgent: %ZTP-6-EXEC_SCRIPT: Executing the downloaded config script Jan 7 21:05:24 localhost ZTP: : Interface found: 1 Jan 7 21:05:24 localhost ConfigAgent: %ZTP-6-EXEC_SCRIPT_SUCCESS: Successfully executed the downloaded config script Jan 7 21:05:24 localhost ConfigAgent: %ZTP-6-RELOAD: Rebooting the system Flushing AAA accounting queue: [ OK ]
At this point the switch has downloaded the ZTP file, applied what it found there, and reloaded. So, without direct interaction, the switch booted, ran our script, loaded the new configuration, downloaded new code, rebooted, and applied the new code.
ZTP-Switch#sho ver Arista DCS-7280SR-48C6-M-F Hardware version: 21.05 Serial number: SSJ17290598 System MAC address: 2899.3abe.9f92 Software image version: 4.21.1F Architecture: i386 Internal build version: 4.21.1F-9887494.4211F Internal build ID: 1497e24b-a79b-48e7-a876-43061e109b92 Uptime: 0 weeks, 0 days, 0 hours and 10 minutes Total memory: 32458980 kB Free memory: 30521916 kB
The hostname ZTP-switch
came from the copied startup-config, and the switch is now running on a new revision of code.
Imagine a scenario in which you’re installing 50 new Arista switches. This procedure could put a base configuration on them all while also loading the right version of code. That’s my kind of time-saving feature! Oh, and if you’re using CloudVision, this is exactly how that tool configures new switches! CloudVision has a ZTP server built in that handles all of this for you. Note that you need to enable the DHCP server on CloudVision if you want to use it, and this is because you can also use your own DHCP server if, for example, you already have one in place.
If you don’t like writing your own code and you don’t have CloudVision, there’s another option (Arista loves options!) called ztpserver that you can get from the EOS Plus team at its GitHub page.
Not only that, but some customers have developed scripts that autoarchive the configurations on all of their switches every day (which is always a good idea) and then update the ZTP files accordingly. When a switch fails, ZTP on the replacement switch references the proper files that have been scripted to contain the configuration of the switch being replaced. Think about the time saved during an outage when the switch configures itself after a replacement.
In my training labs at Arista, I’ve written a ZTP script that allows the switches to configure themselves based on their position within the network. How? By issuing a show lldp neighbor
(Chapter 11) command that allows the switch to know what interface it’s connected to on the management switch. If the switch is connected to interface e5 on the management switch, it configured itself as student-05. You can see this script on my GitHub page. The ability to have the switch run a script in order to configure itself is a fabulous capability, but remember that if you don’t like to code, you can use ZTP server or CloudVision (Chapter 15) to accomplish something similar.