Large-scale applications face a broad range of complex challenges. Algorithms must be extremely scalable to support the massive system size. Robustness, fault-tolerance and security must be fundamental concerns because of the unreliable and often hostile Internet environment. The inherent complexity of Internet-scale applications requires self-management and autonomous adaptation to remove this burden from system administrators. Finally, data management issues in such systems can be immensely difficult due to inconsistent data, heterogeneity and dynamic system evolution.
In our work, we see a number of promising approaches to address these challenges. New types of middleware systems help handle the complexity of building Internet-wide applications by providing high-level abstractions to programmers. For example, publish/subscribe techniques can be used to build novel middleware architectures for Internet-wide information systems. Peer-to-peer algorithms have revolutionised the way we design large-scale distributed applications. They can lead to systems that are intrinsically scalable and resistant to network and node failures. Overlay networks allow us to quickly prototype new networking services and deploy them on top of existing networks. In our projects, we experiment with such novel ideas to build large-scale distributed applications and evaluate our ideas with real-world Internet deployments.
Large-Scale Stream Processing
There is an ongoing paradigm shift away from the mere point retrieval of web content to the dynamic processing of large amounts of real-time stream data. This stream data comes from geographically distributed sources that are connected to the Internet, such as sensor network deployments, scientific instruments, network monitors, and web feeds. Distributed applications in areas as diverse as financial processing, e-science, security, and environmental protection aim to harness this data.
As a result, large-scale stream-processing applications need to query, process, and deliver real-time data from many distributed data sources at a global scale. Due to the vast amount of data, centralised processing is infeasible and applications must typically make use of shared Internet resources for filtering and aggregation. The challenge is to develop new techniques for large-scale stream-processing applications that enables them to cope with scale, failure, and resource limitations, while providing an efficient and reliable service to users on the Internet. This requires research into dynamic query optimisation, adaptive fault tolerance, and resource discovery and management, all being done at an Internet scale.
Peer-to-Peer Systems
Peer-to-peer systems have revolutionised the way how we build large-scale distributed applications. Peer-to-peer algorithms can lead to system designs that are intrinsically scalable and resistant to network and node failures. In the form of overlay networks, they allow us to prototype new networking services and quickly deploy them on top of existing networks. They also provide us with a set of high-level abstractions such as distributed hash tables that simplify the task of engineering novel applications.
However, there is still much need for new types of adaptive overlay networks that are tailored to different classes of applications. Also, we require a better understanding of the dynamic behaviour of peer-to-peer systems on the Internet. For this, we can take advantage of large-scale networking test-beds such as PlanetLab and Emulab to carry out experiments with distributed applications at an unprecedented scale.
Event-Based Systems
To handle the complexity of scale in today's Internet-wide applications, we require new types of middleware that emphasise scalability, expressiveness, and robustness. Publish/subscribe techniques can be used to build novel middleware architectures for Internet-wide information systems. One challenge is to design scalable and robust content-based routing algorithms that deliver information to all interested parties in a large-scale distributed application. We also need higher-level abstractions that make it easier for applications to manage low-level events at a high rate without being overwhelmed.