DSN 2011 Logo
Proactive Failure Avoidance, Recovery and Maintenance
(PFARM)

DSN Workshop at Sheraton Hong Kong
June 27th, 2011

Miroslaw Malek
Humboldt University,
Berlin, Germany

Felix Salfner
Humboldt University,
Berlin, Germany

Kishor S. Trivedi
Duke University,
Durham, USA

Domenico Cotroneo
University of Naples, Italy

Tadashi Dohi
Hiroshima University, Japan

Michael Grottke
University of Erlangen-Nuremburg, Germany

Michael R. Lyu
Chinese University of Hong Kong, China

Simin Nadjm-Tehrani
Linköping University, Sweden

Takashi Nanya
Canon, Japan

Allen P. Nikora
Jet Propulsion Labs, USA

András Pataricza
Budapest University of Technology and Economics, Hungary

Manfred Reitenspiess
Fujitsu, Munich, Germany

Lisa Spainhower
IBM,USA

Neeraj Suri
Technical University Darmstadt, Germany

Kalyan Vaidyanathan
Oracle, USA

Aad van Moorsel
Newcastle University, Great Britain

Download PDF, here



FOCUS of PFARM

Over the last decade, research on dependable computing has undergone a shift from reactive towards proactive methods: In classical fault tolerance a system reacts to errors or component failures in order to prevent them from turning into system failures, and maintenance follows fixed, time-based plans. However, due to an ever increasing system complexity, use of commercial-off-the-shelf components, virtualization, ongoing system patches and updates and dynamicity such approaches have become difficult to apply. Therefore, a new area in dependability research has emerged focusing on proactive approaches that start acting before a problem arises in order to increase time-to-failure and/or reduce time-to-repair. These techniques frequently build on the anticipation of upcoming problems based on runtime monitoring. Industry and academia use several terms for such techniques, each focusing on different aspects, including self-* computing, autonomic computing, proactive fault management, trustworthy computing, software rejuvenation, or preventive/proactive maintenance. It is the goal of this workshop to increase collaboration among researchers from various communities all over the world working on the topic of PFARM. We want to provide a stimulating, and fruitful forum to foster collaboration among researchers working on similar topics, to discuss ideas, exchange experiences and to find new answers to the overall challenge of improving system dependability in contemporary computing and communication systems by an order of magnitude or more.
We are interested in submissions from both industry and academia. Topics include, but are not limited to:

  • Runtime dependability assessment and evaluation (reliability, availability, etc.)
  • Runtime monitoring for online fault detection and diagnosis, including monitoring data processing
  • Prediction methods to anticipate failures, resource exhaustion or other critical situations in complex systems, distributed systems, adaptive or peer-to-peer networks.
  • Predictive diagnosis and fault location as well as root-cause analysis
  • Online recovery, updates and upgrades, non-intrusive hardware installation and software deployment
  • Proactive maintenance strategies (short-term as well as long-term)
  • Optimal decision algorithms and policies to manage and schedule the application of actions
  • Downtime minimization or avoidance mechanisms such as preventive failover, state-clean up, proactive reconfiguration, failure-prevention driven load balancing, prediction-driven restarts, rejuvenation, adaptive checkpointing, or other prediction-driven enhancements of traditional repair methods
  • Proactive fault management and maintenance techniques such as monitoring-based replacement, configuration and management of computer systems and components
  • Dependability evaluation including models to assess the impact on metrics such as availability, reliability, security, performability, survivability and user-oriented metrics such as service availability, downtime, quality-of-service and quality-of-experience.
  • Case-studies, applications, experiments, experience reports

PROGRAM

13:30 - 13:45 Welcome and Introduction to PFARM Game
13:45 - 14:10 Practical Online Failure Prediction for Blue Gene/P: Period-based vs Event-driven
Li Yu, Ziming Zheng, Zhiling Lan and Susan Coghlan
(Abstract)
14:10 - 14:35 Detecting Resource Leaks through Dynamical Mining of Resource Usage Patterns
Huxing Zhang, Gang Wu, Kingsum Chow, Zhidong Yu and Xuezhi Xing
(Abstract)
14:35 - 15:00 DynaPlan: Resource Placement for Application-Level Clustering
Rick Harper, Kyung Ryu, David Frank, Lisa Spainhower, Ravi Shankar and Tom Weaver
(Abstract)
15:00 - 15:30 Break
15:30 - 16:15 Invited talk: Proactivity = Observation + Analysis + Knowledge extraction + Action planning?
András Pataricza, Budapest University of Technology and Economics (BME)
(Abstract)
16:15 - 17:00 PFARM Game Results and Winner Comments
18:00 - 21:00 Welcome Reception

PFARM GAME

Since the goal of the PFARM workshop is to bring together researchers in the area we proposed one of the good ways to get to know other people and their interests by simply playing a game. We hence have invented the PFARM game, which was an online game that took place during the PFARM workshop on June, 27th, 2011. The goal of the game was to identify top challenges in PFARM research.
The results of the PFARM game can be accessed here.
As a final social event, winners received a small prize to help them foster the top challenges in PFARM research.

IMPORTANT DATES

Submission deadline:  March 15, 2011
Author notification:  April 15, 2011
Camera ready version:  May 1, 2011
Workshop:  June 27, 2011

FURTHER INFORMATION

Workshop location and registration: DSN 2011 web page

For any further information or question, please contact pfarm2011@informatik.hu-berlin.de


Last updated June 9th, 2011