- Home
- Research Initiatives
- Network Programming and Automation
Network Programming and Automation
Advanced Network Middleware
To develop advanced networks, multiple basic architectural issues must be resolved, especially with regard to precisely matching the requirements of services and applications with network resources. Providing sophisticated capabilities for matching these requirements and resources is an increasingly complex challenge. Such interlinking tasks must be accomplished through a mid-level set of integrated processes, capabilities, and technologies, commonly termed "middleware." A broad range of advanced network middleware is being developed to support many new services and applications and make networks significantly more high-performance, high-capacity, reliable, adaptive, persistent, manageable, scalable, customizable, and intelligent.
Such middleware includes many new types of network processes, systems, and technologies, including those that provide for access control, for advanced reservations of required network resources, and for guarantees that network performance will match the resources requested by the application.
For example, such middleware provides sophisticated capabilities that allow networks to discover and allocate dynamically and more intelligently (precisely meeting requirement needs) highly distributed resources, integrate them, and enable them to be utilized by multiple applications, and for some simultaneously. Advanced middleware can also dynamically adjust to multiple changes in application resource requirements, while network resources are also dynamically changing. These capabilities are especially important for large-scale, high-performance, data-intensive compute, storage, and network-intensive applications. When such capabilities are integrated into large-scale distributed systems, they can be part of a research ecosystem that can control many types of resources, including scientific instrumentation, sensors, storage systems, data repositories, scientific, analytic systems, high-performance computational systems, and visualization facilities.
Much early research and development in this area was conducted by communities developing distributed terascale and petascale HPC and Grid systems. Many initial concepts were developed through standardization forums such as the Open Grid Forum, particularly its Grid High-Performance Networking Research Group. (Ref: Grid Networks: Enabling Grids With Advanced Communication Technology, Eds, F. Travostino, J. Mambretti, G. Karmous-Edwards, Wiley 2006).
Currently, these initiatives center around Software Defined Networking (SDN), including by using the P4 programming language ("Protocol Independent, Target Independent, Field Reconfigurable"). Another emerging topic is focused on network automation, e.g., developing high-fidelity views of network traffic, providing sophisticated analytics of that traffic, and employing AI/ML/DP techniques to generate responses.
Software Defined Networking (SDN), Software Defined Exchanges (SDXs), Software Defined Infrastructure (SDI)
Consequently, the switch can function as a forwarding device containing inherent rules. The controller can change those rules as required for individual switches and multiple highly distributed switches. The centralized controller acts as an all-knowing system with complete visibility across all switches in the network, enabling fine-grained decision-making and optimization when managing the flows. This SDN approach can be expanded through P4—Programming Protocol-Independent Packet Processors (not a protocol)—an open network programming language17. It is focused on capabilities for programming the data plane, allowing the programming of network switches, network interface controllers (NICs), and other network devices. Traditionally, switch functionality has been closely integrated with ASICs. P4 makes it possible to develop and compile code implemented on a switch (physical or virtual) to change functionality. Given its protocol independence, P4 is a flexible choice for a quantum network where many protocols are being explored. These techniques have been incorporated into the StarLight Software Defined Exchange (SDX) and resource extensions – Software Defined Infrastructure (SDI).
P4: Protocol Independent, Target Independent, Field Reconfigurable
iCAIR is engaged in multiple research projects using the P4 network programming language (“Protocol Independent, Target Independent, Field Reconfigurable”), enabling many new capabilities for programmable networks, including capabilities supporting data-intensive science services. Particularly important P4 capabilities are in-band telemetry (INT), which enables high-fidelity network flow visibility. To develop the capabilities of P4, an international consortium of network research institutions, including iCAIR, is collaborating to operate an International P4 Testbed. This testbed provides a highly distributed network research and development environment to support advanced empirical experiments at a global scale, including on 100 Gbps paths. The implementation includes access to the P4Runtime implementation.
The current P4Runtime specification includes a Multi-Controller design implemented in a Master/primary and Secondary/standby model. The testbed can be sliced to support multiple P4 use-case scenarios. Other research projects on this testbed explore highly granulated telemetry insight into data flows, using capabilities for marking, tracking, and analyzing individual packets for high-fidelity views into real-time traffic flows, even for high-capacity E2E flows. Several P4 research projects are exploring mechanisms to use P4 to enable enhanced network control planes for generalized network operations. The international P4 consortium is exploring options to add additional resources to the testbed, including NICs that can support compiled P4 code.
Network Entropy Platform
Recent successes of implemented “research platforms” have demonstrated that such infrastructure and services are key enablers of large-scale data-intensive science. These platforms are based on an architecture consisting of various orchestration techniques (e.g., Kubernetes), low management overhead, and tenant-oriented applications. This approach develops services focused on meeting the requirements of research science communities, especially data-intensive science.
High-performance international WAN networking is a high-priority concern for major research platform usage scenarios. P4 and programmable network targets have recently become major enablers for these services. Consequently, research testbeds, such as the International P4 Experimental Networks (iP4EN) testbed, have become important resources for exploring potential contributions of techniques for programmable data planes to high-performance networking for science. To support different monitoring and anomaly detection approaches, iCAIR is currently prototyping and demonstrating a Network Entropy Platform for the Global Research Platform (GRP) and related Research Platforms.
Software-defined network for End-to-end Networked Science at Exascale (SENSE)
iCAIR is participating in the Software-defined network for End-to-end Networked Science at Exascale (SENSE) initiative, a multi-resource, multi-domain orchestration system that provides an integrated set of network and end-system services.
SENSE includes the mechanisms to integrate resources beyond the network, such as compute, storage, and Data Transfer Nodes (DTNs) into this automated provisioning environment. Key elements of the SENSE architecture, situated between the SDN layer and science program application agents, include an ontological model of the sites and networks, Site and Network Resource Managers, and an Orchestrator. SENSE is closely related to the AutoGOLE initiative, and SENSE components are being integrated into AutoGOLE services.
SENSE is developing smart network services to accelerate scientific discovery in a time when 'big data' is increasingly driven by exascale computing, cloud computing, machine learning, and AI. SENSE provides a comprehensive approach to request and provision end-to-end network services across domains that combines deployment of infrastructure across multiple labs/campuses across WANs with a focus on usability, performance, and resilience using intent-based, interactive, real-time application interfaces providing intuitive access to intelligent SDN services for Virtual Organization (VO) services and managers; policy-guided end-to-end orchestration of network resources, coordinated with the science programs' systems, to enable real-time orchestration of computing and storage resources; auto-provisioning of network devices and Data Transfer Nodes (DTNs); real-time network measurement, analytics and feedback to provide the foundation for full lifecycle status, problem resolution, resilience and coordination between the SENSE intelligent network services, and the science programs' system services; priority QoS for SENSE enabled flows; and multi-point and point-to-point services.
A related project using SENSE is the Rucio/File Transfer Service (FTS)/XRootD data management and movement system, the key infrastructure used by LHC experiments and more than 30 other programs in the Open Science Grid. This project explores SENSE's interoperation with the Rucio/FTS/XRootD data management/movement system to enable a new set of services for domain science workflows. The new features include an ability for science workflows to define priority levels for data movement operations through a Data Movement Manager (DMM) that translates Rucio-generated priorities into SENSE requests and provisioning operations. Additional features include full-lifecycle monitoring, evaluation, and adjustment of associated network services.
UCSD Monitoring and Real Time Analytics Service for Passive and Active Measurements (IGROK)
iCAIR is participating in the UCSD IGROK project, which is developing a state-of-the-art Elastic Stack cluster implemented on the UCSD PRISM Science DMZ connected to CENIC CalREN-HPR, Pacific Wave (which extends to the StarLight Exchange) and to FABRIC. MMA nodes deployed at selected Points of Presence (PoPs, including at StarLight) send data to IGROK.
UCSD has developed and deployed monitoring and traffic analysis tools for performance data collected by passive and active measurements. IGROK operates in CPU/RAM-bypass mode as a storage/analysis system, forming a near Terabit-connected NVMe Elastic Search cluster. Currently, nodes have 8 TB NVMe capable of 15 GB/s per card in each 2U chassis, 56 TB total in 7 chassis, each of which can deliver up to 100 Gb/s each, or 700Gb/s for the cluster. Nodes also have OCP-3 slots for ConnectX-6 200G NICS. This doubles the performance with the addition of more NVMe devices. 3x 8TB cards in each of the seven nodes would provide 168TB of NVMe at 1.4Tb/s with the proper switches to feed CENIC’s CALREN-HPR and FABRIC.
Router for Academia, Research and Education (RARE)
iCAIR is participating in the GEANT RARE (Router for Academia, Research, and Education) initiative, which is a free and open-source routing platform that is being used to create a network operating system (NOS) on commodity hardware (a white box switch). RARE uses FreeRtr as a control plane software and is thus often referred to as RARE/FreeRtr. RARE incorporates techniques that control the data plane by managing entries in Match Action Unit (MAU) tables. Every routed interface is in a virtual routing table, and every layer interface is in a bridge table.
This approach provides for one control plane and several data planes. It exports control plane computation results to DPDK or hardware switches and employs Data Plane Programming (DPP) Languages such as Programming Protocol-independent Packet Processors, e.g., P4 language. It can also be used in conjunction with Network Management as a service technique (NMaaS), a platform for network management providing a portfolio of network management and monitoring applications, a per-user, secured network monitoring infrastructure, and dockerised images implemented through a Kubernetes cluster.