TU Delft
 
Alexandru IOSUP
Ph.D. Thesis / Dissertation
Parallel and Distributed Systems
EWI PDS A.IosupResearch
 
 
 
 
 
 
 
 
 
Search


 
A Framework for the Study of
Grid Inter-Operation Mechanisms
 
(as of 20 January, 2009) by Dr. Alexandru Iosup
 
A. Iosup, A Framework for the Study of Grid Inter-Operation Mechanisms, Ph.D. Thesis
(alternative source #1: TU Delft Repository)
Author Alexandru Iosup
Title A Framework for the Study of
Grid Inter-Operation Mechanisms

(click title to download. warning: 5MB PDF file)
ISBN/EAN 978-90-9023297-3
Copyright (c) 2008 by Alexandru Iosup. All rights reserved.
Keywords grid inter-operation, Delegated MatchMaking, Grid Workloads Archive, GrenchMark, DGSim, trace-based simulation, grid computing.
 
Table of Contents
1. Raising the Curtains1
2. A Basic Grid Model9
3. The Grid Workloads Archive17
4. A Comprehensive Model for Multi-Cluster Grids35
5. The GrenchMark Testing Framework71
6. The Delft Grid Simulation Framework95
7. Alternatives for Grid Inter-Operation109
8. Inter-Operating Grids through Delegated MatchMaking129
9. Lowering (But Not Closing) the Curtains153
Bibliography159
A. Validation of the Random Numbers Use179
Summary181
Samenvatting185
Curriculum Vitae189



Quick links
committee propositions summary cv


Committee
the thesis committee
Rector Magnificus of Technische Universiteit Delft, the Netherlands, chairman (voorzitter)
Prof.dr. Henk J. Sips, Technische Universiteit Delft, the Netherlands, supervisor (promotor)
Assoc.Prof.dr. Dick H.J. Epema, Technische Universiteit Delft, the Netherlands, supervisor (copromotor)
Prof.dr. Henri E. Bal, Vrije Universiteit Amsterdam, the Netherlands
Prof.dr. Arie van Deursen, Technische Universiteit Delft, the Netherlands
Prof.dr. Arjan J.C. van Gemund, Technische Universiteit Delft, the Netherlands
Prof.dr. Thomas Fahringer, Universitšat Innsbruck, Austria
Prof.dr. Miron Livny, University of Wisconsin-Madison, USA
Prof.dr. Nicolae Tapus, Universitatea Politehnica Bucuresti, Romania



Propositions
the thesis
Having grids inter-operate leads to better performance than having the same grids operate independently.
A good solution for the problem of grid inter-operation is Delegated MatchMaking.
The framework for the study of grid inter-operation mechanisms proposed in this thesis allows the study and comparison of grid inter-operation architectures and mechanisms.
The following three grid computing myths are just that: grid computing is mostly used for parallel computing, common grid middleware is getting more reliable, and centralized approaches can support production grid workloads.


Summary
a summary of the thesis content
A Framework for the Study of Grid Inter-Operation Mechanisms
 

The study of the history of computing infrastructures reveals an integration trend. For example, the explosive growth of the Internet in the 1990s was the result of an integration process started in the 1960s with the emerging networks of computers. By using the Internet, millions of users were capable of accessing information anytime and anywhere, much like other daily utilities such as water, electricity, and telephone. However, an important category of users remained under-served: the users with large computational and storage requirements, e.g., the scientists, the companies that focus on data analysis, and the governmental departments that manage the interaction between the state and the population (such as census, tax, and public health). Thus, in the mid-1990s, the vision of the Grid as a universal computing utility was formulated. The main benefits promised by the Grid are similar to those of other integration efforts: extended and optimized service of the integrated network, and significant reductions of maintenance and operation costs through sharing and better scheduling.

While the universal Grid has yet to be developed, large-scale distributed computing infrastructures that provide their users with seamless and secure access to computing resources, individually called Grid parts or simply grids, have been built throughout the world -- in different countries, for different sciences, and both for production work and for computer-science research. At the same time, the main technological alternatives to grids, that is, supercomputers and large clusters, have evolved into much larger, scalable, and reliable systems. Thus, the integration of existing grids into larger infrastructures and finally into The Grid is key in keeping the grid vision attractive for its potential users.

The integration of grids raises a double challenge, one related with the efficient scaling of a distributed computing system, the second associated with the operation of a system across different ownership and administrative domains. Thus, many of the traditional approaches for inter-operating computer systems, such as those based on completely centralized or purely decentralized system ar- chitectures, are eliminated from the start. To mark the distinction between the typical problem of integrating smaller components into a larger system and the double challenge of grid integration, we call the latter the problem of grid inter-operation. In this thesis we approach the problem of grid inter-operation with two main objectives: to design a comprehensive framework for the study of grid inter-operation mechanisms, and to provide an initial but good solution for this problem.

Our framework provides both the theoretical support and the tools for finding new and improved solutions for this problem. The tools are assembled into a research toolbox for the study of grid inter-operation mechanisms. This research toolbox addresses two problems that have hampered the grid community in the past decade: the lack of knowledge about the workloads and resources of real grids, and the lack of tools for grid simulation and performance evaluation in real environments. Research using unrealistic characteristics or characteristics that are specific to other types of environments is being limited in scope and applicability, and may even miss the problems that are specific to grids. Thus, real data and realistic models of grid workloads and resources are critical for designing efficient and scalable architectures. Using for simulation and for performance evaluation in real environments tools that have not been adapted to the requirements of grids leads to slower progress and to results that are di±cult to compare. Thus, tools adapted to grids and aimed at producing results that can be shared with other researchers are needed.

The contents of this thesis is split into four logical parts: the introduction, a toolbox for grid inter-operation research, a method for grid inter-operation, and the conclusion.

We begin the thesis with an introduction to the problem of grid inter-operation that focuses on the challenges of grid inter-operation addressed by this thesis. In Chapter 1 we also present an overview of the framework for the study of grid inter-operation mechanisms introduced in this thesis. In Chapter 2 we introduce a basic model for grid inter-operation. This model, required to understand the remainder of the thesis, defines the components of a grid system, the types of applications that can be found in a grid, the system users, and the grid job execution model.

The toolbox for grid inter-operation research is described in Chapters 3, 4, 5, and 6, which we describe in turn. In Chapter 3 we present the Grid Workloads Archive (GWA). We design the GWA with a focus on building a grid workload data repository, and on establishing a community center around the archived data. One of the important design achievements is the formulation of a grid workload format for storing job-level information that can be extended for higher-level information such as co-allocated jobs or resource reservations. We develop a comprehensive set of tools for collecting, processing, and using grid workloads. To make the GWA accessible by non-expert users, we devise a mechanism for automated trace ranking and selection. So far, the GWA contains traces from nine well-known grid environments, with a total content of more than 2,000 users submitting more than 7 million jobs over a period of over 13 operational years, and with working environments spanning over 130 sites comprising 10,000 resources.

In Chapter 4 we describe the extension of the basic model for grid environments into a comprehensive model for (multi-)grids. By analyzing real data such as long-term system traces of real grids, we find that grid resources exhibit a highly dynamic availability both over the course of single days and over whole years. We also find that grid workloads are very different from the workloads of other related systems such as parallel production environments and distributed web servers. Based on the results of this analysis, we design and validate a comprehensive model for grid resource dynamics and evolution, and for grid workloads that include parallel jobs and/or bags-of-tasks.

In Chapter 5 we introduce the GrenchMark testing framework. The main focus of this framework is on testing large-scale distributed computing systems with synthetically generated yet realistic workloads. We test and validate our reference implementation of the GrenchMark framework, and show that GrenchMark has been successful in testing real multi-cluster grids and pools of resources. The experimental results show that a grid testing tool focusing on realistic workloads can indeed be used to assess important characteristics of real systems that are otherwise not available, such as scalability limits, overheads, and reliability.

To conclude the presentation of our grid research toolbox, in Chapter 6 we introduce the DGSim grid simulation framework. The main focus of this framework is on facilitating repeated simulations of multi-cluster and multi-grid environments under realistic workload. We test and validate our reference implementation of the DGSim framework, and show that DGSim has been successful as the simulation tool for several design space exploration studies of grid settings that are larger than the previous state-of-the-art.

The method for grid inter-operation and a solution for the grid inter-operation problem are described in Chapters 7 and 8, which we describe in turn. In Chapter 7 we study the existing alternatives for grid inter-operation, and introduce a novel architecture for grid inter-operation. We classify real grid systems according to their architectural and operational components. The practical limitations of the centralized grid inter-operation approaches are evaluated in a real environment. These two preliminary steps allow us to assess the grid inter-operation ability of existing grid resource management systems; we find that this ability is limited. Thus, we introduce a novel architecture for grid inter-operation with a better potential of ful¯lling the requirements of grid inter-operation. The architecture is a hybrid between hierarchical and purely decentralized architectures. The set of architectures investigated here provides a comprehensive architectural space for the problem of grid inter-operation.

In Chapter 8 we introduce a novel approach for grid inter-operation, Delegated MatchMaking. Our approach, which couples the hybrid architecture introduced in the previous chapter with a novel inter-operation mechanism, is compared with five alternatives through trace-based simulations, and is found to deliver the best performance especially when the system is heavily loaded. While many other mechanisms can be designed in the future, our experiments prove that the Delegated MatchMaking approach already is a good solution for the problem of grid inter-operation. Our experiments also demonstrate that the inter-operation of existing grids can lead to significant performance gains in comparison with leaving them operate independently.

At the end of this thesis, Chapter 9 summarizes our main achievements and presents future direc- tions for this work. The direct use of the framework for the study of grid inter-operation mechanisms holds good promise for future research. In particular, "How many clusters are best?" and other related questions about the system structure can find answers under this framework, leading to important contributions to automating system provisioning and administration. With extensions, our framework can be used to investigate important classes of resource management problems, such as mechanisms and incentives for more system decentralization, scheduling for specific classes of applications or scheduling under less strict information availability assumptions, and guarantees for Quality-of-Service for commercial workloads. We have already taken initial steps in several of these directions.

top of the page


Curriculum Vitae
cv (thesis format)

Alexandru Iosup was born on the 12th of June 1980 in Bucharest, Romania. He received in 2003 a B.Sc./Eng. degree from the Politehnica University of Bucharest (UPB), and in 2004 an M.Sc. degree from the same university. He went on to become in 2004 a PhD student with the Faculty of Electrical Engineering, Mathematics, and Computer Science at Delft University of Technology (TU Delft), as a member of the Parallel and Distributed Systems group. In the summer of 2003 and in the spring of 2004 Alexandru Iosup was a visiting researcher with the grid group of Dr. Stephane Vialle at Supelec, Metz, France. In the fall of 2006 he was a visiting researcher with the Condor group of Dr. Miron Livny at the University of Wisconsin-Madison, USA. He was also a visiting researcher for shorter periods of time with Dr. Ramin Yahyapour (University of Dortmund, Germany), with Dr. Thomas Fahringer (University of Innsbruck, Austria), and with Dr. Nicolae Tapus (UPB, Romania).

Alexandru Iosup's research interests are in the areas of parallel and distributed computing, with a focus on grids, clouds, and peer-to-peer systems and their application to e-Science and commercial workloads. He is the founder of the Grid Workloads Archive, the largest archive for workload traces of grid computing environments. He is a member of the team that performed the largest BitTorrent measurements and analysis to date. His work on grids was awarded the third place at the ACM Student Research Competition 2007, was nominated for a best-paper award at the ACM SuperComputing Conference 2007, and was invited for the special journal issue with the best papers of the EuroPar Conference 2008. He was the co-recipient of the IEEE P2P 2006 best-paper award for work on peer-to-peer systems. He received a Werner von Siemens Award in 2004 for his M.Sc. project.

Alexandru Iosup worked between 2001 and 2004 as a part-time software engineer, work that led to the creation of award-winning shareware games, and of solutions for the on-line monitoring of power transformers. Between 2002 and 2008 he was involved in various teaching activities, including the supervision of M.Sc. students. Throughout this period he has been an active promoter of games as an educational medium for engineering and technical material; as a result, several courses at both UPB and TU Delft now use educational material based on games. Alexandru Iosup was involved in the research community as conference program committee member, peer-reviewer, and scientific event organizer.

Alexandru Iosup is co-author to over 50 publications. His work attracted until November 2008 over 200 citations (Google Scholar), with an h-index of 9 (Harzing's Publish-or-Perish). A complete list of Alexandru Iosup's publications can be found at: http://www.st.ewi.tudelft.nl/~iosup/.

top of the page


[ .bib | .pdf | .ps | .doc | .ppt ]
[ .tgz | .zip | description | any ]


 

                                                                                                                                                                                                                                             
   Online reports  
A survey of computer benchmarking issues and approaches
added Sep 2008
   
The GrenchMark
ACM SRC Award
added Mar 2008
   
The Grid Workloads Archive
added Aug 2007
   
The Delft Grid Simulator
added Aug 2007
   
A peer-level view of a P2P network
added Aug 2005
   
   Misc  
World time zones
added Aug 2007
   
     

Last modified: Fri, 1 December, 2008 8:26 PM
The newest version of this page can be found at: http://www.pds.ewi.tudelft.nl/~iosup/aiosup_phd_thesis.html
Copyright © 2008 Alexandru Iosup. All Rights Reserved.
Google Analytics .