Authors: Annette Grueber, Tom Jaschinski and Tobias Winkler

Introduction

By the time of this article, digital services provide key functionality to businesses and everyday life. Due to the progress of digitization, the reliance on digital services has been growing rapidly. This process is not only shown by the growing number of inter connected devices that communicate with each other but also by the impact of unavailable services during an incident: On the 4th of October 2021, Meta (former Facebook) and all of its organizations (e.g. Instagram, WhatsApp, Facebook Messenger) were unavailable for up to seven hours. The outage resulted in a huge profit loss for the company and connected businesses. [1]

This outrage shows that modern solutions must be designed resiliently to enable service provisioning during incidents. There are multiple solutions to develop high-available and reliable services that can be applied to various levels in a system’s architecture and design. Some programming languages are specifically designed to meet these challenges. The functional programming language Erlang provides inherent functionality to develop these resilient services.

Therefore, this article presents the exemplary design and development of a communication service based on the Internet Relay Chat (IRC) protocol in Erlang to investigate its availability features.

Background

This chapter deals with the basics of the programming language Erlang. The subchapters give an overview of functional programming and features specific to Erlang.

Functional Programming

Functional programming is a programming paradigm. Various characteristics specify what constitutes functional programming:

  • Pure functions: Pure functions are deterministic functions which always produce the same output value for identical input values. Therefore, it can be concluded that a function cannot be influenced from the outside. Hence, there are no side effects. [2]

  • Immutability: Immutability refers to the fact that data cannot be changed. Once variables have been assigned a value, the variable can no longer be reinitialized. Therefore, reinitialization is only possible by introducing a new variable with the adjusted value. Since classical loops in imperative programming languages require updating variables by e.g. incrementing them, there are no loop statements in functional programming. Iterating over data requires recursive function calls. [2]

  • Referential Transparency: The property of pure functions, as well as the immutability of the variables, results in referential transparency. This means as soon as the result of the function is available, it can be used for the same input values. [2]

To illustrate the characteristics above, the calculation of the factorial of a number n is shown in the successive figure. The left side shows the algorithm for a classical imperative approach using a loop. The right side represents the algorithm in a functional approach using recursive function calls.

 

Imperative Programming vs. Functional Programming
Figure 1: Imperative Programming vs. Functional Programming

 

Erlang

Erlang is a functional programming language initially developed by Ericsson. In the following, native features, and the Open Telecom Platform (OTP) library are discussed and presented.

Native Features

Erlang provides native features such as high-level instructions for implementing high available services. Exemplary features are process supervisors, hot swap functionality, inter-process communication principles, and a framework called Open Telecom Platform (OTP). OTP is a library that eases the development of stable and parallel applications with Erlang. Main components within OTP are so-called behaviors that separate specific business logic of an application from generic instructions which provide general capabilities such as standardized inter-process communication. Examples of this would be clients, servers or supervisors, which will be dealt with in one of the next chapters. [3]

Processes

One of the most important characteristics of Erlang are its processes which are based on the actor model. The model describes parallel activities as actors that communicate with each other via message exchange and do not have a shared memory. The tasks of an actor are:

  • Receive and process messages

  • Send messages to other actors

  • Start further actors

  • Change the local state of the actor

Erlang, an actor is called a process. [4] In the upcoming illustration the communication of two processes A and B in Erlang is presented.

 

Process Communication Example
Figure 2: Process Communication Example

 

The left section of the illustration shows the sending of a message ‘hello’ from process A to process B. A must send its process identification ID (Pid_A) to actor B, so that B knows from whom the message is coming from. In addition, A needs the process identification from B (Pid_B) to address the message to the correct process. For this purpose, the spawn function must be called in the process from which the message is sent (in this case by A) and hand over specific parameters: A module name of process B, a main function of process B and corresponding function parameters. The spawn function starts process B and returns its process identification ID (pid). This enables the use of the exclamation mark to send the message on the left {self (), hello} to the process on the right. The self () function returns the pid from process A.

On the right-hand side, the figure shows how the message receival is implemented in process B in this case by using a receive block statement. Within a receive block, the left side of the arrow statement represents the expected syntax of the received message. If the message corresponds to the specified syntax, the specific action on the right side of the arrow is carried out.

Erlang processes are implemented within Erlang’s own runtime environment in which they are processed separately. The creation of such a process requires a few bytes of memory and is therefore computationally fast. [3]

Supervisor

A supervisor is a process that starts, stops, and monitors other subordinate processes such as other supervisors or worker processes. A worker process is a process that is monitored while executing the application’s business logic but does not monitor processes itself. Supervisors can thus be used to build a hierarchical process structure to construct fault-tolerant applications. [5]

Hot Swap

Hot Swap is a term describing the capability of a system to exchange components during runtime. This mechanism increases the system’s availability by enabling it to run during maintenance resulting in low and optimally no downtime. Hot swapping is applicable to both hardware and software components. Swapping code during runtime is useful in high-available service environments e.g. telecommunication networks and mission-critical systems.

Erlang natively supports hot swapping source code modules by utilizing an integrated component called Erlang Code Server that stores and manages module execution and versioning. [6] Hot swapping is based on two types of Erlang function calls:

  • Local function calls: Local function calls describe function calls from one module to a function within the same module, e.g. foo().

  • Fully qualified function calls: Fully qualified function calls describe external function calls from one module to a specified module, e.g. module_a:foo(). [7]

A simplistic approach to hot swapping an exemplary module in Erlang is presented step by step in the following:

  1. An Erlang module A in its version 1 is loaded and run in the code server.  

    Figure 3: Hot Swap – Module ‘A’ Version 1 executed
    Figure 3: Hot Swap – Module ‘A’ Version 1 executed

     

  2. The operator of the system decides to update the module A during runtime and therefore loads a version 2 into the code server. [7]  

    Hot Swap – Module ‘A’ Version 1 and 2 loaded
    Figure 4: Hot Swap – Module ‘A’ Version 1 and 2 loaded

     

  3. After loading version 2 into the code server, version 1 is still executed until the running process performs a fully qualified function call to a function within module A. [7]  

    Hot Swap – Module ‘A’ version 1 and 2 executed
    Figure 5: Hot Swap – Module ‘A’ version 1 and 2 executed

     

  4. If there are parallel processes running the module A, it could occur that both code versions are executed at the same time: One process runs the old version 1 of module A until all local function calls have been completed. A second parallel process already executes the new version since it already completed the local function calls after loading the new version 2 into the code server. [7]  

    Hot Swap –Module ‘A’ Version 2 executed
    Figure 6: Hot Swap –Module ‘A’ Version 2 executed

     

  5. If version 1 is not executed anymore, the code server will remove the old mod ule. [7]  

    Hot Swap – Module ‘A’ Version 1 removed
    Figure 7: Hot Swap – Module ‘A’ Version 1 removed

     

This example only contemplated hot swapping for one Erlang module. In real-world Erlang systems spanning multiple computing nodes increase hot swapping difficulty due to these issues:

  • Multiple Erlang code modules

  • Parallel processing

  • Complex module dependencies

  • Downgrade requirements in case of a failed version update

  • Determination of a suitable scope of the modules to be exchanged

Erlang OTP meets these challenges by providing a high-level API for hot swaps using a configuration file that describes version upgrade and downgrade. This file is called an ‘appup’ file. [8] A collection of ‘appup’ files can be compiled to so-called ‘relup’ files, which are then loaded during hot swap. [9]

Mnesia

Mnesia is a database system that was designed and developed specifically for distributed and high scalable Erlang systems. It supports the storage of any Erlang data structures such as tuples and lists. For this reason, unlike many other databases, Mnesia has the advantage that no conversion to other data types is required. [10]

Internet Relay Chat

Internet Relay Chat (IRC) was specified under RFC 1459, in 1993. The protocol describes a simple architecture in which IRC clients can send text messages to each other over IRC servers. A client always connects to a server instance. Servers can connect to other IRC servers to form an IRC network that relays the transmitted messages. A client has the possibility to send messages directly to other clients as well as sending messages to channels. A channel consists of multiple users that receive every message published to the channel. The channels can be administered by a client with appropriate user rights. Transmitted messages within an IRC network are not stored on the IRC servers. Therefore, messages can only be sent to users who are online. The protocol is usually used with TCP/IP based communication. [11]

Design

This section describes the design for an IRC server implemented in Erlang.

Software Architecture

The following software architecture is developed to meet the requirements stated in RFC 1459. To design the IRC server, the software components are categorized into these component classes:

  • Supervisor components: The supervisors provide management capabilities for the running business processes that execute the business logic.

  • Erlang business processes: The Erlang business processes are running the business logic of the IRC server.

  • Mnesia database tables: Mnesia tables are the components that enable data persistence within the system.

  • Utility components: Utility components provide reusable functions to the modules running in the business process.

  • External components: External components are systems that interact with the designed system via defined APIs.

The resulting software component architecture is displayed in Figure 8.

IRC Server Component Diagram
Figure 8: IRC Server Component Diagram

 

As presented in the component diagram, the IRC server consists of three supervisors in a hierarchical structure. A ‘MainSupervisor’ component monitors two sub-supervi sors: An ‘ApiSupervisor’ for managing the ‘IrcApi’ process and a ‘MessageHandler Supervisor’ that monitors the processes running the main business logic (‘Message Handler’ and ‘CommandExecutor’ component). This supervisor structure resembles a centralized control approach in which the ‘MainSupervisor’ can start and kill the whole IRC server instance while delegating business process monitoring to dedicated supervisors. This structure enables the sub-supervisors ‘ApiSupervisor’ and ‘Message HandlerSupervisor’ to have different process restart policies increasing the independence of the supervised processes which results in a higher service reliability and robustness in case of errors.

Furthermore, to strengthen the decoupling of the processes, a message-driven inter process communication is used. The ‘IrcApi’, the ‘MessageHandler’ and ‘Comman dExecutor’ components pass information to each other via asynchronous messages. Through this, all processes can exist and operate independently while avoiding the possibility of process deadlocks.

A utility component ‘DataAccessHandler’ for accessing the data persistence layer (Mnesia tables) provides an internal, use-case-oriented high-level API for managing the IRC data. The component aggregates CRUD operations in the component’s functions that are provided by the Mnesia tables to e.g. add a new user to the IRC server.

External components can access the server via a TCP connection. Since both IRC client and other IRC servers use similar messages, a unified ‘IrcApi’ for both external component types is used. The IP address and the port number of a received TCP/IP communication identifies the message sender (e.g. user client or external IRC server instance) which leads to different command execution procedures.

Data Model

Figure 9 shows the data model of the designed IRC server. All data is stored in the Mnesia database presented earlier. The next five Mnesia tables are required to implement an IRC server according to the RFC1459 standard.

  • users: Each client connected to an IRC network is stored in this table. As defined in the IRC standard, the nickname of every user is unique and serves as a primary key to identify clients. The ‘socket_id’ attribute within the user table is used by the server instance to send IRC messages over TCP and distinguish the connected clients from each other.
  • user_modes: Clients can have certain modes which can be specified in the ‘user_modes’ table.
  • channels: A channel has a unique name for identification and can contain a channel description indicating the purpose of it. All users participating in a channel are linked to the channel information in this table.
  • channel_modes: This table contains all settings of a channel.
  • servers: This table is used to store information about other servers of the IRC network.
IRC Data Model
Legend: ? Attributes ? Tables ? Relationships
Figure 9: IRC Data Model

Evaluation

A load test is performed to analyze how the designed system behaves under a constantly increasing load.

Tsung

The test is based on the distributed load test tool Tsung which is protocol-independent and currently supports common network protocols e.g. HTTP, SOAP, TCP. The tool is developed in Erlang probably due to inherent advantages such as performance, scalability, and error tolerance. This resiliency ensures the most important feature of Tsung: Simulating many simultaneous users from a single computer. [12]

Load Test Resources

The load test was performed in a VirtualBox with the following resources:

Load Test Resources
Figure 10: Load Test Resources

Load Test Configurations

A Load Test in Tsung is configured via an XML file. The total length of the test run results from the seven arrival phases that make up the load progression: [12]

 <load> <arrivalphase phase="1" duration="60" unit="second"> <users arrivalrate="10" unit="second"/> </arrivalphase> <arrivalphase phase="2" duration="60" unit="second"> <users arrivalrate="5" unit="second"/> </arrivalphase> <arrivalphase phase="3" duration="60" unit="second"> <users arrivalrate="4" unit="second"/> </arrivalphase> <arrivalphase phase="4" duration="60" unit="second"> <users arrivalrate="3" unit="second"/> </arrivalphase> <arrivalphase phase="5" duration="60" unit="second"> <users arrivalrate="5" unit="second"/> </arrivalphase> <arrivalphase phase="6" duration="60" unit="second"> <users arrivalrate="1.2" unit="second"/> </arrivalphase> <arrivalphase phase="7" duration="60" unit="second"> <users arrivalrate="1" unit="second"/> </arrivalphase> </load>

Figure 11: Tsung Configuration File – Load Section

 

The first phase of the test run has a duration of 60 seconds and creates 10 new users per second. Hence a total of 600 users are created in this phase. In the course of the test, fewer and fewer users are created per second until the test reaches phase 4. In phase 5 there is a small increase in user connections and then the number of new users drops. That way the rise and fall of user volume is simulated.

The total length (420 seconds) of the configured load test results from the summed phase duration. However, as presented in the next subsection, the results exceed this duration. New users are constantly added until test completion. This is the reason why Tsung collects results after the duration of 420 seconds. After this, there is still time needed where the server processes the requests of these clients and closes the connection.

The requests sent by each user client to the server are defined in a session as presented within the following XML snippet:

<session type="ts_raw"> <!—Declaration and initialization of nickname, username and channelname--> <!—Defined Requests--> <!—Nick and User Request--> <!—Join Request--> <!—Send message Requests--> <request subst="true"> <raw data="PRIVMSG #%%_channelname%% :nachricht" ack="local"/> </request> <request subst="true"> <raw data="PRIVMSG #%%_nickname%% :nachricht" ack="local"/> </request> <!—Quit Request--> </session>

Figure 12: Tsung Configuration File – Session Section

 

The type ‘ts_raw’ in Tsung enables sending traffic to TCP/UDP servers. Therefore, proprietary, or uncommon network protocols transmitted over TCP/UDP can also be tested with Tsung. Within the ‘session’ tag the variables nickname, username and channelname are initialized with random strings to create users with different nicknames and usernames, as well as different channels since nickname and channel names are unique in IRC. After that, the requests to be executed per created user are listed. These would be:

  • User Request: Specify properties such as username, hostname etc. of a new user

  • Nick Request: Give user a nickname or change it

  • Join Request: User joins a channel

  • Privmsg Request: Send message to a channel

  • Privmsg Request: Send message to another user

  • Quit Request: Close connection between user and server

The next chapter presents the results of the load test configurations defined in this chapter.

Load Test Results

As displayed in Figure 13, many users are connected to the server at the same time over the total length of the test.

IRC Load Test – Simultaneous IRC Users Over Time
Figure 13: IRC Load Test – Simultaneous IRC Users Over Time

 

As presented in the line plot, the maximum number of concurrent users at the end of the test is 1683.

IRC Load Test – Test Result
Figure 14: IRC Load Test – Test Result

 

In the plot on the right of Figure 14 the number of requests (red) and connections (green) per second is presented. A ‘connection’ refers to the established link between a client and the server, and a ‘request’ is a message sent over this link from the client to the server. The maximum rate of 44.6 requests per second is already high at the start of the test run. After this increase, that number decreases over the duration of the test as defined in the test configuration. As displayed by the connection rate, more test users connect at first and fewer over time. During the test run, a total of 6732 requests have been sent. After all users have been added the request count and connection count do not decrease abruptly, but gradually. This is because of the server’s processing time after the defined test runtime of 420 seconds has expired.

The same applies to the plot on the left-hand side. It shows the mean value of the connection time (green line) and the mean value of the duration of a request (red line) in milliseconds (ms). The ‘connection time’ specifies the duration of the connection between a client and the server and the ‘request time’ provides information about how long the server needs to respond to a request from a client. Over a test run of 420 seconds, the average request time measured in multiple time intervals of ten seconds can range from 2.05 ms to 3.02 ms. For the connection time, it is the range of 0.488 ms and 1.86 ms.

Synposis

Applying functional programming paradigms to a specific problem is unique compared to imperative programming styles. Functional programming languages allow developers to use different approaches to how algorithms can be implemented. Furthermore, data handling is kept simpler. There are no classes, so no object-oriented approaches can be utilized. Erlang as a functional programming language offers additional features to ease the use of functional programming approaches. The supervisor in Erlang is a high-level component that makes supervised processes robust against unexpected errors. Hot swaps enable changing code during runtime but require time-intensive preparations to ensure service availability. Standard IO in Erlang has proven to be problematic since the complete process is blocked on input in the terminal and cannot be terminated correctly by any other process. Since Erlang usually handles multiple processes, IO could lead to unexpected problems such as temporary deadlocks and crashes.

The IRC server currently implements basic functions. Connecting, joining and managing channels as well as chatting is possible. With the help of supervisors and the simple use of parallel processes, the IRC server is very stable concerning load and reliability, especially in case of misuse it’s hardly possible to kill the complete IRC server.

In conclusion, Erlang is mainly used in distributed systems, where reliability and scalability are important. This means it is well suited for communication systems such as the developed IRC server. As far as documentation about Erlang is concerned, there are less sources about Erlang tools as compared to more prominent programming languages (e.g. Python, Java, JavaScript).

References

[1] Wikipedia, 2021 Facebook Outage. [Online]. Available: https://en.wikipedia.org/ wiki/2021_Facebook_outage (accessed: Nov. 29 2021).

[2] TK, “An Introduction to the basic principles of Functional Programming,” freeCodeCamp.org, 15 Nov., 2018. https://www.freecodecamp.org/news/an-introduc tion-to-the-basic-principles-of-functional-programming-a2c2a15c84/ (accessed: Nov. 10 2021).

[3] Ericsson, System Architecture User’s Guide: Introduction. [Online]. Available: https://www.erlang.org/doc/system_architecture_intro/sys_arch_intro.html (accessed: Nov. 9 2021).

[4] M. Grotz, Elixir und Erlang: Nebenläufigkeit ganz einfach. [Online]. Available: https://www.informatik-aktuell.de/entwicklung/programmiersprachen/elixir-und erlang-nebenlaeufigkeit-ganz-einfach.html (accessed: Nov. 9 2021).

[5] Ericsson, Erlang Reference Manual: STDLIB. [Online]. Available: https://www.er lang.org/doc/man/supervisor.html (accessed: Nov. 9 2021).

[6] Ericsson, Erlang Kernel Reference Manual: Code. [Online]. Available: https:// www.erlang.org/doc/man/code.html (accessed: Nov. 6 2021).

[7] Ericsson, Erlang Reference Manual: Code Replacement. [Online]. Available: https://www.erlang.org/doc/reference_manual/code_loading.html#code-replacement (accessed: Nov. 6 2021).

[8] Ericsson, System Architecture Support Libraries: Appup. [Online]. Available: https://www.erlang.org/doc/man/appup.html (accessed: Nov. 6 2021).

[9] Ericsson, Erlang Reference Manual: System Architecture Support Libraries (SASL). [Online]. Available: https://www.erlang.org/doc/man/relup.html (accessed: Nov. 6 2021).

[10] Ericsson, Erlang Reference Manual: Mnesia. [Online]. Available: https://www.er lang.org/doc/man/mnesia.html (accessed: Nov. 9 2021).

[11] rfc1459. [Online]. Available: https://datatracker.ietf.org/doc/html/rfc1459 (accessed: 12/16/2021).

[12] 1. Introduction — Tsung 1.7.0 documentation. [Online]. Available: http://tsung.er lang-projects.org/user_manual/introduction.html (accessed: Nov. 9 2021).