|
|
 |
 |
 |
 |
 |
GrenchMark: A Framework for Analyzing, Testing, and Comparing Grids
Quick links
| Introduction |
 |
 |
| why and how is this work
relevant? |
 |
Today, production grids bring together (tens of) thousands
of resources. Infrastructures such as CERN’s LCG, NorduGrid,
TeraGrid, the Open Science Grid, etc., offer similar
or better throughputs when compared with large-scale parallel
production environments [1]. However, the dynamicity, the
heterogeneity, or simply the sheer scale of today’s grids expose
problems in grid performance, reliability, and functionality.
Early testing in real grids reveals lower performance than
expected from simulations [2], failure rates from 10% up
to 45% [3], [4]1, and functionality problems of around 1 in
every 3 tests for widely installed grid services [6]. Thus, an
important research question arises: How to gain insights into
the performance and the reliability of large-scale distributed
computing systems such as grids? In this work we attempt to
address this question with the GRENCHMARK framework for
analyzing, testing, and comparing grids.
1 The failure rate in today’s grids is much higher than that of contemporary
large-scale parallel production installations [5].
|
| The GrenchMark Framework |
 |
 |
| short presentation |
 |
|
GRENCHMARK is a comprehensive framework for realistic,
repeatable, and comparable testing of large-scale distributed
computing systems. We describe below only a few important
design aspects. To support realistic testing, we focus on
mechanisms to generate and process grid workloads. The
Workload Modeler and the Workload Generator (components
2 and 3 in Figure 1) are responsible for realistic workload
generation. We make use of databases of workloads and
workload models, and we provide mechanisms for workload
(selective) truncation and scaling. We also make use of a
database of real applications. To facilitate repeatable testing,
we store test provenance data, and we are able to replay tests.
The Test Manager and the Workload Submitter (components
1 and 4 in Figure 1) are responsible for coordinating the
repeatable testing and workload submission, respectively. Note
that the Workload Submitter receives feedback from the tested
environment, allowing tests in which the submission depends
on dynamic information (e.g., testing with grid workflows, or
testing with service-level agreements). To enable comparable
testing results, we provide a framework for testing, including
metrics that take into account the system size and other
environment specifics. The Data Manager (component 5 in
Figure 1) is responsible for storing and for analyzing the
testing and other (e.g., provenance) data. Note that typical
additional data are resource availability information (grids are
dynamic but fully-monitorable environments) and the logs of
different grid middleware. |
| Results with the GrenchMark Reference Implementation |
 |
 |
| Selected results, to show what GrenchMark can do |
 |
The GRENCHMARK reference implementation has been
used for various testing scenarios in: grids [7], [3], [8], peer-to-peer
file-sharing [9], and heterogeneous resource management
(i.e, based on Condor [10]). Overall, we have run more than
250,000 test jobs in the last 18 months, in over 25 fullyautomated
testing scenarios. Below we show and briefly comment
a sample of the results of the Condor tests, performed
during 2 weeks on 600 processors of the Condor pool at
U.Wisc.-Madison.
Figure 2 depicts the throughput and the goodput of the
system for 100 consecutive runs of 1000 jobs each. The
user obtains a high rate of goodput even in a production
environment: over 0.5 CPUyears of goodput in two days.
Condor is fair with respect to resource consumption, and the
throughput and goodput rates are halved after the user exceeds
his quota (at the beginning of 31 Aug 2006).
Figure 3 shows the job wait time properties of 24 consecutive
runs of 1000 jobs each. With the exception of mean
and median job wait time, different runs exhibit different
distribution properties. Since the range of the job wait time
values for each run is large, some jobs have a much lower
wait time than others. Figure 4 shows which jobs are thus
favored by Condor. The jobs arriving first exhibit high wait
time variability. Overall, there is a trend for jobs arriving later
to wait more than the jobs arriving earlier. There appears to
be no correlation between the average wait time and the run
index, i.e., later runs do not have a higher average waiting
time.

Figure 1. The GrenchMark framework design. |

Figure 2. The evolution of goodput and throughput over time. The sampling
interval is 4 hours. After 31 Aug 2006, the throughput is halved, and the
cumulative goodput increases at half the rate from before. |

Figure 3. The job wait time distribution for 24 consecutive test runs. Each
distribution is depicted as a box-and-whiskers set with additional points for
the median and the mean. Different runs exhibit similar mean and median,
but very different value ranges. |

Figure 4. The job wait time for three test runs, selected for low (Run 15),
average (Run 6), and high (Run 19) wait time. Overall, there is a trend for
jobs arriving later to wait more than the jobs arriving earlier. |
|
| Related Work |
 |
 |
| are there others? |
 |
Following results from the parallel systems community,
several grid performance evaluation and benchmarking approaches
focus on tests using micro-benchmarks, microkernels,
and application benchmarks [11], [12]. For the few
other grid performance evaluation tools, the main focus is
either distributed deployment and testing [13], or executing
ad-hoc functionality tests [14], [15]. GRENCHMARK focuses
more on the testing process, with additional types of workload
data sources, richer workload generation, and more detailed
analysis.
|
| Acknowledgements |
 |
 |
| there is no 'me' in research, only team work |
 |
This work was carried out in the context of the Virtual
Laboratory for e-Science project (www.vl-e.nl), which
is supported by a BSIK grant from the Dutch Ministry of
Education, Culture and Science (OC&W), and which is part
of the ICT innovation program of the Dutch Ministry of
Economic Affairs (EZ). We further thank Miron Livny and
the Condor team at U.Wisc.-Madison for providing the testing
environment used for part of this work. We also want to thank
the people who have contributed (directly or indirectly) to this work over the years:
Dr. Dick Epema, Dr. Nicolae Tapus, Dr. Catalin Dumitrescu,
Catalin Cirstoiu, Mugurel Andreica, and Corina Stratan.
|
| Publications,
conferences, talks |
 |
 |
| validating our
work... |
 |
|
|
A.Iosup, D.H.J.Epema,
GrenchMark: a Framework for Testing
Large-Scale Distributed Computing Systems,
(submitted).
info the journal presentation of GrenchMark: over 25 use cases, replaying traces from the Grid Workloads Archive
and from the Parallel Workloads Archive, comprehensive extensions over the GrenchMark presented in the CCGrid 2006 publication.
|
 |
|
|
A.Iosup
GrenchMark: a Framework for Testing
Large-Scale Distributed Computing Systems,
In the ACM/IEEE SuperComputing Conference on High Performance Networking and Computing (SC'07), Posters/ACM Student Research Competition.
third place in the ACM SRC/Graduate Student competition.
info poster presenting GrenchMark.
|
|
|
|
M. Andreica, N. Tapus, A. Iosup,
D.H.J. Epema, C. Dumitrescu, I. Raicu, I. Foster, M. Ripeanu,
Towards ServMark, an Architecture for Testing Grids,
CoreGRID Technical Report TR-0062, Nov 29, 2006.
info grid computing, performance evaluation, testing real environments
|
|
|
|
A. Iosup, D.H.J.Epema,
GrenchMark: A Framework for Analyzing, Testing, and Comparing Grids,
In the 6th IEEE/ACM Int'l Symposium on Cluster Computing and the Grid (CCGrid'06) (accepted, 25%). An extended version can be found as Technical Report TU Delft/PDS/2005-007, ISBN 1387-2109).
info Using GrenchMark: simple and composite Grid jobs, replaying traces from the Parallel Workloads
Archive, 10 use cases for analyzing, testing, and comparing common grid settings.
|
|
|
| References |
 |
 |
| these studies have enabled
us to work on this project |
 |
- A. Iosup, D. H. J. Epema, C. Franke, A. Papaspyrou, L. Schley, B. Song,
and R. Yahyapour, “On grid performance evaluation using synthetic
workloads.” in JSSPP, ser. LNCS, vol. 4376, 2006, pp. 232–255.
- A. Iosup, C. Dumitrescu, D. H. Epema, H. Li, and L. Wolters, “How are
real grids used? The analysis of four grid traces and its implications.”
in GRID. IEEE Computer Society, 2006, pp. 262–269.
- A. Iosup and D. H. J. Epema, “Grenchmark: A framework for analyzing,
testing, and comparing grids.” in CCGRID. IEEE Computer Society,
2006, pp. 313–320.
- O. Khalili et al., “Measuring the performance and reliability of production
computational grids,” in GRID. IEEE Computer Society, 2006.
- B. Schroeder and G. A. Gibson, “A large-scale study of failures in highperformance
computing systems,” in DSN. IEEE Computer Society,
2006, pp. 249–258.
- A. Iosup, D. Epema, P. Couvares, A. Karp, and M. Livny, “Build-and-test
workloads for grid middleware: Problem, analysis, and applications,”
in CCGRID. IEEE Computer Society, 2007, pp. 205–213.
- H. H. Mohamed and D. H. J. Epema, “Experiences with the koala coallocating
scheduler in multiclusters.” in CCGRID. IEEE Computer
Society, 2005, pp. 784–791.
- O. Sonmez, H. Mohamed, and D. Epema, “Communication-aware job
placement policies for the koala grid scheduler,” in e-Science. IEEE
Computer Society, 2006, pp. 79–86.
- J. Roozenburg, “Secure decentralized swarm discovery in tribler,” Master’s
thesis, Delft University of Technology, Delft, NL, Nov. 2006.
- D. Thain, T. Tannenbaum, and M. Livny, “Distributed computing in
practice: the condor experience.” Concurrency - Practice and Experience,
vol. 17, no. 2-4, pp. 323–356, 2005.
- R. F. Van Der Wijngaart and M. Frumkin, “Nas grid benchmarks version
1.0,” NASA, Technical Report NAS-002-005, 2002. [Online]. Available:
http://www.nas.nasa.gov/News/Techreports/2002/PDF/nas-02-005.pdf
- G. Tsouloupas and M. D. Dikaiakos, “GridBench: A workbench for grid
benchmarking.” in EGC, ser. LNCS, vol. 3470, 2005, pp. 211–225.
- I. Raicu, C. Dumitrescu, M. Ripeanu, and I. T. Foster, “The design,
performance, and use of diperf: An automated distributed performance
evaluation framework.” J. Grid Comput., vol. 4, no. 3, pp. 287–309,
2006.
- G. Chun, H. Dail, H. Casanova, and A. Snavely, “Benchmark probes
for grid assessment.” in IPDPS. IEEE Computer Society, 2004.
- S. Smallen, C. Olschanowsky, K. Ericson, P. Beckman, and J. M.
Schopf, “The Inca test harness and reporting framework.” in SC. IEEE
Computer Society, 2004, p. 55.
|
|
 |
|
 |
|