Deep Learning on a Cluster of 100 Edge Devices
En route to replacing the cloud for all AI training. A three part series on setting up your own cluster of edge devices (1/3).
This is the first post, of a three-part series, describing how we built a cluster of 100 edge devices, in order to train deep learning and machine learning models without ever using the cloud, whilst achieving close to perfect accuracies.
These 100 edge devices, are intended to replicate real world edge devices, such as self checkout POS, cameras, connected cars, medical devices, etc..
This first post focuses on the various aspects that we considered while building our edge-device clusters, including hardware, network limitations, power supply requirements, the actual construction of the cluster and more. The next post of this series will address cluster deployment and ongoing management; and finally, post 3 will present the challenges and our overcoming, as we encountered them, while training a deep-learning model on top of various types of hardware (edge).
The Idea and The Need
With the increased penetration and proliferation of IoT devices, including the propagation of connected devices into our daily lives (from cars to self-checkout stands), the quantity of data collected globally is increasing exponentially. Coupled with growing privacy concerns and the security risks associated with access to this humongous volume of data, a new generation of algorithms for training deep learning models is emerging. These new training algorithms, collectively referred to as Distributed Edge Training, enable edge devices to collaboratively train a model that is shared between the various edge devices belonging to the same owner, while at the same time maintaining the data or datasets on the devices themselves — thereby decoupling the ability to train a model from the need to store the data in a datacenter. You can read more about the need and thought process here
Building an Edgify Infrastructure
In order to study the actual effect of training deep learning models on edge devices deployed in real-life scenarios, we set off at Edgify to build our own infrastructure of 100 edge devices.
Our insights, experience and the challenges we faced while building this massive infrastructure, along with the training of a deep learning model on top of it, are the focus of this 3 part series.
Where to Begin?
With little experience in these kinds of projects, Edgify allocated three months to build a massive cluster of computers that would eventually be used to train deep learning models — all while emulating real-world devices and network connectivity.
We began by looking online at similar projects in order to compose a list of the topics to be tackled. Unfortunately, our research did not identify anyone that had built a similar project (which is one of the reasons we are sharing this experience). The only project that we found was a small-scale edge-device project that included only a few edge devices. At that point, we were faced with planning and mapping out the project ourselves. We set off by compiling the following list, which we identified as the main items to be considered while attempting such a project:
Obtaining the Hardware:
- Edge Devices
- Cluster Rack
Setting Up and Connecting the Hardware:
- Power Supply
- Network (Internal and External)
Obtaining the Hardware
Luckily, with the growing hype around deep learning, there are a few hardware solutions out there, that allow for deep learning models to run on top of their devices, such as Raspberry Pi, Rock64 and so on. To keep pace with the trend, companies such as Intel (with its NUC series) and NVIDIA followed suit and began manufacturing their own single-board computers (SBCs), which simplified the hardware acquisition task for us.
Because our main goal was to test how our distributed edge training framework performs (in terms of CPU, memory and network) on a large variety of devices (read more on distributed edge training), we divided our cluster into four types of Intel NUC devices — Intel Celeron, Intel i3, Intel i5 and Intel i7 — all with a minimum of 8GB RAM and 120GB SSD storage.
After selecting our edge devices, we next focused on how to group them. For us, this meant finding a hardware solution that connected all the SBCs. To do so, we took the height and width of each device and added the space needed between them plus the space needed above them for airflow and cables. Our result translated into a 48U server cabinet (600*1,200*2,321 mm) that contained 14 shelves, with each shelf containing 8 SBCs (a total of 44U). The remaining 4U was for network components.
Finally, we needed to define a master (server) to manage all SBCs. This will be discussed in more detail in post 2 of this series. In order to avoid doing the same thing 100 times for each update, we needed a way to control and manage the entire cluster. To this end, we reused a Dell server (Dell PowerEdge — T430) with an Intel Xeon chip inside (E5 2600 v4) and 32GB RAM DDR4 memory that we had in the office with a 10Gbps Internet card.
- 1x Dell PowerEdge — T430
- 30x Intel NUC Celeron
- 30x Intel NUC i3
- 30x Intel NUC i5
- 10x Intel NUC i7
- 100x Memory — 8GB DDR4
- 100x Storage SSD — 120 GB
- 1x Rack/Cabinet — ADP-802–751248 Premium Cabinet
- 14x Shelves for Rack
- 1x UPS Top V10 KRM Online 10KVA/8000W Single-phase UPS Rack 19
- 3x Switch JL355A Aruba 2540–48G 4SFP+ Switch
- 1x Router/Firewall FortiGate 80E — Hardware plus 24x7 FortiCare and FortiGuard Unified (UTM) Protection
- 130 Cables Cat 6a 3M
Setting up and Connecting the Hardware
A month after finalizing our hardware plan, we received all the components, and the real work began.
Although our SBCs are designed to consume relatively little power, we knew that the sheer number of devices for our plan would consume a substantial amount of electricity. To know exactly how much power we needed, we took each device’s maximum power requirements (based on its spec) and added them all together. To that, we added the server plus a marginal allowance for spare. The resulting power consumption was approximately 20 Amps for the entire setup. For this purpose, we built a 32Amp dedicated power connection for the entire cluster. We added an Online 10 kva/9Kw three-phase UPS in order to protect the cluster from power surges and to provide a half-hour power supply in the event of a power outage, in order to allow for a graceful shutdown of all devices (you always want to overprotect a project like that).
We paid an electrician $150 to construct the power supply and infrastructure for the air conditioning, which involved one day’s work and it was done.
That, as it turned out, was the easy part.
In our case, cooling proved to be an extremely important factor. 100 devices operating together produce a tremendous amount of heat, especially if all of them are training AI models simultaneously, so close to each other. During our planning stage, we had calculated the height of each shelf to allow sufficient space for air flow. However, we could not run the air conditioner 24/7 in our offices. Therefore, we had to purchase a new and separately compressed, air conditioner, to handle the cooling of the cluster.
Constructing the Rack
Finally, we were ready to construct the rack. Putting together a cluster rack of this size is complicated. Just considering the number of wires for both the network and power can be a daunting task. We had hoped for some inspiration from previous such jobs, but we could not find any reference to a project of this size.
When determining the location of this cluster, we focused on two considerations –
- Firstly, no one wants to sit next to 100 working computers.
- Secondly, a cooling tube and AC compressor were required, but not having a window in the room would complicate construction even further.
The only available room at our office was a 9-square-meters room, which was originally a playroom. It was a relatively small, closed space with no windows, making it ideal for achieving the maximum cooling efficiency, but with a more complicated infrastructure. However, due to its distance from a window, we needed to add a water drain pump for the air conditioner. It took two weeks to convert the room into a server room (rack room). To support the cooling of the room, we purchased a 2 horse-power AC, which cost around $700.
Constructing the Network
Constructing the network was fairly straightforward.
To connect all of the cluster’s devices, we added three switches (JL355A Aruba 2540–48G 4SFP) to the cluster (50 connections each) with a 10Gb connection speed between the three switches and a 1Gb connection between the switch and the devices.
To connect the cluster to the Internet, we connected the cluster to a router + firewall (FortiGate 80E) via a 1Gb Ethernet cable, which was connected to an ISP on a 100Mbps symmetrical Internet connection.
To connect the switch to server, we used one 10Gbps connection.
The Result and What’s to Come…
In the second post of this series, we will cover the software and architecture designed and built by our very talented DevOps team to automatically manage the cluster — from the cluster’s initial boot to ongoing, day-to-day maintenance and performance monitoring. We will also discuss how to handle, manage and monitor 100 edge devices without having to install software on each one of them, separately.
Edgify.ai has been researching distributed edge training for four years. We are building a platform (framework) that enables the training and deployment of machine learning models directly on edge devices, such as smartphones, IoT devices, connected cars, healthcare equipment, smart dishwashers and more. We are committed to revolutionising the privacy, information security, latency and costs associated with AI.