CGW'07 and EUCHINAGrid

		W O R K S H O P
Kraków, Poland O c t o b e r 15 - 17, 2007		Kraków, Poland O c t o b e r 16 - 18, 2007

SPONSORS

back

A b s t r a c t s

Norbert Attig

Simulation Laboratories: a Novel Community-Oriented Research and Support Structure

Abstract:

The dramatic growth in the available computing power including memory, data storage, and data transfer capabilities has revolutionised many aspects of science. In many areas, numerical simulations are the essential method for achieving novel, high-quality results. Politicians and decision makers in Europe become more and more convinced that the installation of a sustainable pan-European HPC infrastructure is indispensable to foster this development and to keep up with countries like the U.S. and Japan. In September 2006, the European Strategy Forum on Research Infrastructures (ESFRI) has published a European Roadmap for Research Infrastructures, which for the first time includes the foundation of a European High-Performance Computing (HPC) Service. The key idea of this service is the establishment of a sustainable supercomputer infrastructure for the computational sciences. The HPC in Europe Taskforce (HET) was engaged in preparing proposals for the definition of such an infrastructure. HET recommended to integrate HPC resources in an HPC ecosystem and to structure these resources in shape of a performance pyramid. At the top of this pyramid there is a small number of leadership-class high-end systems, mainly funded through national sources, but with additional European support. The middle layer of the pyramid consists of a number of national and regional supercomputers. The latter still are powerful supercomputers able to run the load below the petaflops level. The bottom of the pyramid consists of local compute servers that should enable the development of a strong competence base of computational scientists. HET advocated a European Research Grid that will connect the systems at all levels. It became apparent that only a common consortium of all interested European partners would be able to implement this new research infrastructure. Following the recommendation of HET such a consortium has been established and named Partnership for Advanced Computing in Europe (PACE).

While the focus of PACE lies on setting up an integrated sustainable HPC infrastructure including the procurement of supercomputer hardware of the highest performance class, the complementary question of software development and productive usage of current and future high-end systems has not yet being addressed. It must be a joint endeavour of supercomputing centres and the scientific and engineering research communities, in analogy to the US-DoE SciDAC programme. The required tight integration and optimal utilisation of future petascale supercomputers and distributed scientific expertise in Europe bring new challenges into supercomputing which must be tackled adequately. Substantial efforts have to be made on enabling computational sciences communities to achieve solutions to problems with high scientific impact through efficient use of high-end computational resources. A new, community-oriented structure, so-called Simulation Laboratories, is proposed and will be installed by the Jülich Supercomputing Centre (JSC) to meet these requirements. Simulation Laboratories have been conceived in analogy to large-scale experiments (e.g. the Hubble telescope). The expertise provided by a Simulation Laboratory to a research community goes far beyond the support of a traditional expert advisor, because the Simulation Laboratory is completely integrated in its community, has an excellent overview about community-related methods and algorithms, is servicing its community and is well staffed. Additionally, JSC proposes Cross-Sectional Groups that will intensify the development of HPC methods and algorithms, programming tools and petabyte data repositories. Petaflop computers need petaflop software and scalable algorithms integrated into the HPC ecosystem. The Simulation Laboratories and the Cross-Sectional Groups will complement each other and will establish together a first-class science-based partnership between disciplinary communities and supercomputing centres.

back

Piotr Bala

Chemomentum

Abstract:

The Chemomentum is EU project which will provides Grid-based solutions for workflow-centric, complex applications, such as risk assessment, toxicity prediction and drug design. It focuses on tools for dealing with data and knowledge in an efficient and reliable manner. The Chemomentum tools are based on the most recent version of the UNICORE midlleware. The recent version of the UNICORE is a Grid Service infrastructure compliant with the Open Grid Services Architecture (OGSA). Chemomentum provides an easy-to-use Grid system that focuses on the end users, allowing them to use powerful tools in an efficient and transparent way. Intuitive, task-oriented user interfaces allow the users to focus on their work, keeping any Grid related complexity hidden.

The Chemomentum tutorial allows users to get familiar with the recent version of the UNICORE middleware as well as with the additional software components.

The tutorial is dedicated to the grid users as well as developers and will allow for practical examination of the Chemomentum tools.

Instructors will be provided by ICM (Poland) and FZJ (Germany).

back

Marian Bubak

Introduction to the ViroLab Virtual Laboratory

Abstract:

In the ViroLab project [1] we are developing a virtual laboratory (VLvl) [2] to facilitate molecular dynamics simulations, medical knowledge discovery and decision support for HIV treatment [3]. Three groups of users have been identified: clinicians using decision support systems for drug ranking, experiment developers who plan complex biomedical simulations, and experiment users who apply prepared experiments.

VLvl provides mechanisms for user-friendly experiment creation and execution in an integrated development/execution environment, a possibility of reusing existing libraries, tools etc., gathering and exposing provenance information, integration of geographically-distributed data resources, access to WS, WSRF, MOCCA components and jobs, as well as secure access to data and applications. VLvl integrates biomedical information related to viruses (proteins and mutations), patients (viral load) and literature (drug resistance); it enables to plan and run experiments transparently on distributed resources.

An experiment is a kind of processing which may involve acquiring input data from distributed resources, running remote operations on this data, and storing results in a dedicated space [4]. Grid operations are executed with a set of entities called grid objects which provide a virtualization of resources [5]. A provenance module enables tracking, storing and querying information about experiments using ontology-based tools.

from virus genotype to drug resistance interpretation, querying historical and provenance information about experiments, assisting a virologist with the Drug Resistance System based on the Retrogram, a simple data mining with classification [2].

Detailed description of the VLvl structure and functionality is presented on CGW'07 posters 2-9.

Acknowledgements. The VLvl is being developed at the Institute of Computer Science and CYFRONET AGH, Gridwise Technologies, Universiteit van Amsterdam, and HLRS Stuttgart. It is supported by EU IST-027446 project.

References
[1] ViroLab - EU IST STREP Project 027446; www.virolab.org
[2] ViroLab Virtual Laboratory, http://virolab.cyfronet.pl
[3] P.M.A. Sloot, I. Altintas, M. Bubak, Ch.A. Boucher; From Molecule to Man: Decision Support in Individualized E-Health, IEEE Computer, 39 (11), 40-46, 2006
[4] T. Gubala, M. Bubak; GridSpace - Semantic Programming Environment for the Grid, PPAM'2005, LNCS 3911, 172-179, 2006
[5] M. Malawski, M. Bubak, M. Placek, D. Kurzyniec, V. Sunderam; Experiments with Distributed Component Computing Across Grid Boundaries, Proc. HPC-GECO / CompFrame Workshop - HPDC'2006, Paris, 2006

back

Koen Deforche

Estimating the Fitness Landscape Experienced by HIV under Treatment Selective Pressure

Abstract:

Human Immunodeficiency Virus (HIV) is the retrovirus that causes Acquired Immunodeficiency Syndrome (AIDS), and current treatment combines three or more antiviral drugs to block HIV replication. Because HIV is characterized by a large and actively replicating intra-host population, under suboptimal treatment conditions, mutations that confer drug resistance may evolve quickly, leading to treatment failure which is frequently associated with the emergence of drug resistance. To aid the clinician in selecting a treatment regimen to which the virus is still susceptible in presence of resistance mutations, genotypic drug resistance testing is recommended. The interpretation of mutations found towards their effect on drug resistance, remains however an ongoing challenge.

We present a technique to estimate a fitness function of HIV under drug selective pressure, which is compatible with observed evolution during treatment in cross-sectional sequence data. The model combines theoretical knowledge on population genetics and molecular evolution with clinical data of observed resistance evolution. The fitness function incorporates interactions learned using Bayesian Network learning, and its parameters are estimated so that simulated evolution in a treatment naive population using the fitness function matches observed evolution. We present various results using data for the protease inhibitor nelfinavir, and the nucleoside analogue reverse transcriptase inhibitors zidovudine and lamivudine, and show how the fitness function may be used to predict evolution, the individualized genetic barrier to resistance, development of resistance at treatment failure, and improved prediction for treatment response.

back

Carole Goble

Grid 3.0: Services, Semantics and Society

Abstract:

The trend in recent years in distributed systems has been to open up: expose interfaces and content; simplify access; and support creativity through the reuse and combination of already available components, services and content. The drive is for loose coupling, greater agility and rapid application development, for more timely and relevant solutions for users, by users.

Creating and enabling components and content to be reusable is tough, especially when the suppliers are not the consumers. Moreover, there is a gap between the (a) infrastructure and resource provider and (b) the application developer and user. In the Grid we have done a good job enabling VOs of resource providers. What about supporting VOs of users? Or easing the development of applications?

In this talk we will explore areas that we believe the Grid community should address to enable the next generation of Grids: (a) the systematic and automated management of metadata (as found in the Semantic Grid such as our Semantic-OGSA platform); and (b) patterns of lightweight programming and user participation (as found in Web 2.0). By critically applying Web principles to our work on the myGrid / Taverna, myExperiment and S-OGSA projects we better served our users needs. These principles are insightful guidelines that could help the Grid community bridge the gap between Grid technical platforms and the User applications that use them.

back

Tomasz Gubala

Demo of ViroLab Virtual Laboratory

Abstract:

The demonstration of the ViroLab Virtual Laboratory [1] aims to show the current functionality of the system. It will consist of several parts, differing both with respect to the module of the system being presented and the application domain. This presentation is planned as a live demo - namely, everything will happen in real tme and the presenter will perform all actions live. However, due to possible external issues (e.g. internet connection failure) we may be forced to fall back to pre-recorded movies.

The demonstration starts with a short introduction to the main aspects of the laboratory. The core concepts are explained or reintroduced: the notion of in-silico experiments, the experiment pipeline, experiment substrates and results, as well as an overview of the tools that will be presented in the following steps.

The demonstration will then move on to experiment execution. The experiment to be shown will operate on genotype information of a certain HIV virus and will perform nucleotide sequence analysis. The presenter will show how to use the Experiment Management Interface in order to execute an experiment. Afterwards, some modifications will be introduced to the experiment in order to extend its functionality with additional steps, including drug resistance interpretation. This part will conclude with a demonstration of the extended experiment, showing the differences between both experiments.

This will in turn be followed by a demonstration of the virtual laboratory provenance system. The system is responsible for tracking and recording history and provenance data and this information may be retrieved through a dedicated querying tool. The presenter will shortly explain the basic notions regarding the provenance system and will then show how to operate the Query Translation Tool to issue queries to the system. Queries will pertain to virology and HIV therapy domains.

Given enough time, the demonstration will feature two further stages, devoted to different application domains. One will show an experiment that performs data mining on medical data using some parts of the Weka data mining library, while another will consist of an experiment that uses the EGEE computational testbed to run a molecular dynamics computation (using the NAMD package) for a short peptide in water. The result will be presented using the VMD visualization tool.

This set of demonstrations will be wrapped up by a short summary.

References
[1] ViroLab Vitual Laboratory; virolab.cyfronet.pl

back

Pawel Jarosz, Stanislaw Kulczycki, Pawel Plaszczak, Kuba Rozkwitalski

Security framework for Virtual Laboratory in Virolab

Abstract:

ViroLab project aims to develop a virtual laboratory for infectious diseases. With the increase significance of genetic information, treatment and research methods and finally the collected data, the need to improve collaboration across institutions emerges. ViroLab Virtual Laboratory aims to provide infrastructure for data access, experimental execution and collaboration support.

In order to develop a reliable security framework that meets the project requirements the following assumptions were made:

a Virtual Organisation is formed from the combination of resources and users
all participants in a Virtual Organization trust each other;
no central database of users - system should utilise the existing users' databases of hospitals, universities, scientific institutes;
policy is the set of attributes which is needed to access a specific resource;
each service/data provider creates their own policy to ensure that only people with proper attributes can access it;
users are only allowed to use a resource when their attributes match the policy.

Possible solutions to develop a security Framework for Virtual Organisations were analyzed in detail during the first phase of the project. The solutions under consideration included PERMIS, CAS and Shibboleth. It was concluded that Shibboleth would be the most suitable solution because it fulfilled the following objectives:

interoperability and standards compliant (enabling easy integration of these components with software developed in other Work Packages);
Support and community of users and developers gathered around the project, which is very important in case of any problems;
Open-source license (most of the solutions considered require at least slight modifications in order to fulfil ViroLab requirements,such modifications are often not allowed or not even possible in many commercial solutions).

In the security layer there are two levels where authorization and authentication takes place. The first layer, responsible for user authentication and authorization, is built using Shibboleth, while the second one, responsible for secure communication between grid nodes, uses Grid Security Infrastructure (GSI).

To provide Virtual Organization capabilities a new and innovative approach was taken. It uses proven solutions like Shibboleth and SAML, but in a new and modified way that aims to fulfil the specific ViroLab requirements. A Dynamic Virtual Organization is established by using attributes to identify user privileges, which the resource providers can use for authorisation purposes. The framework will be developed to secure all resources accessible in the Virtual Laboratory (e.g. Web Services, SVN).

Acknowledgements. This work was supported by EU project Virolab IST-027446 and Ministry of Science and Higher Education.

References
[1] ViroLab - EU IST STREP Project 027446; www.virolab.org
[2] Security Assertion Markup Language (SAML) V2.0 Technical Overview, http://www.oasis-open.org/committees/security/docs/
[3] Shibboleth-Architecture DRAFT v05, http://draft-internet2-shibboleth-architecture-05.html

back

Jacek Kitowski

PL-Grid - Polish Grid Initiative

Abstract:

As a response to the needs of Polish scientists and ongoing grid activities in other European countries and all over the world, in January 2007 an agreement concerning the creation of the Polish Grid (PL-Grid) Consortium was signed. The Consortium is made up of five largest Polish supercomputing and networking centres. The Consortium prepared the PL-Grid Project proposal, to be financed by national funds over a three-year period (2008-2010). The main aim of the PL-Grid Project is to create and develop a stable Polish Grid infrastructure, fully compatible and interoperable with European and worldwide Grids. The Project should provide scientific communities in Poland with Grid services, enabling ealization of the e-Science model in various scientific fields. In the talk motivation and aims of the project are outlined, together with the structure and description of planned ctivities.

back

Dieter Kranzlmüller

The European Grid Initiative - Status and Overview

Abstract:

The European Grid Initiative (EGI) represents an effort to establish a sustainable grid infrastructure. Driven by the needs and requirements of the research community, it is expected to enable the next leap in research infrastructures, thereby supporting collaborative scientific discoveries. The main foundation of EGI are the National Grid Initiatives (NGIs), which operate the grid infrastructures in each country. EGI will link existing NGIs and actively supports the setup and initiation of new NGIs. In the process of establishing EGI, it is of major importance to incorporate links to our non-European partners around the world. This presentation covers the current ideas of EGI and the planning for the next steps in the setup process. It should be the basis for discussing the collaboration with NGIs on other continents into a global sustainable grid instrastructure.

back

Breanndán Ó Nualláin, Kaustubh R. Patil, and Peter M.A. Sloot

Distributed Decision Support in Virolab

Abstract:

The HIV drug-resistance interpretation systems are used routinely throughout the world in a clinical setting. More knowledge is rapidly becoming available upon which clinical decisions could be made. This knowledge, information, data and evidence from many sources are combined within a Decision Support System (DSS) to provide coherent judgements on drug-susceptibility. At the core of the DSS is a HIV drug-resistance interpretation system incorporating knowledge from the principal systems (Stanford HIVdb, Rega, ANRS, Virolab) in use throughout the world.

We describe an improved rule-based language which has adequate expressiveness and enjoys a fully-specified, formal semantics, allowing for automated reasoning over rule sets. Among the questions which can be addressed are:

Ambiguity: Is the rule set internally ambiguous? Does it allow more than one interpretation?
Completeness: Does the rule set have complete coverage?
Consistency: Are there rules in the set which make contradictory predictions?
Redundancy: Do some rules of a rule set subsume others?
Dissonance: How do rule sets differ in their predictions?
Predictive power: Can one rule set make more specific predictions than another or can it make predictions in cases where the other is silent?

The formal language which we present has a well-defined semantics that will allow for making judgements of the above kinds using reasoning that is either completely automated or at least semi-automated.

Furthermore recent findings have revealed the need to express multiplicative effects of certain mutations on drugs. The state of the art language for specifying HIV drug interpretation rules, ASI, in its present form, is limited to linear combinations of effects.

In future work we will use Bayesian hierarchical modelling to make predictive distributions in the presence of uncertainty. The full chain of analysis will combine Bayesian hierarchical modelling with probabilistic decision analysis based on utility attribution and/or multi-objective optimisation of such quantities as cost, chance and duration of survival or quality-adjusted life years.

This work was done within the Virolab project using the Virolab Virtual Laboratory experimental environment (See www.virolab.org). For more details see: P.M.A. Sloot; A. Tirado-Ramos; I. Altintas; M.T. Bubak and C.A. Boucher: From Molecule to Man: Decision Support in Individualized E-Health, IEEE Computer, (Cover feature) vol. 39, nr 11 pp. 40-46. November 2006.

back

Peter Sloot

Virolab: Modeling HIV from molecule to epidemiology

Abstract:

With infectious diseases frequently dominating news headlines, public health and pharmaceutical industry professionals, policy makers, and infectious disease researchers, increasingly need to understand the transmission patterns of infectious diseases, to be able to interpret and critically-evaluate drug resistance, epidemiological data, and the findings of mathematical modeling studies.

In recent years our understanding of infectious-disease has been greatly increased through mathematical modeling. Insights from this increasingly-important, exciting field are now informing policy-making at the highest levels, and playing a growing role in research. The transmissible nature of infectious diseases makes them fundamentally different from non-infectious diseases, so techniques from 'classical' epidemiology are often invalid and hence lead to incorrect conclusions - not least in health-economic analysis.

In this lecture I will introduce VIROLAB and address recent results obtained with VIROLAB; a virtual decision laboratory for Infectious diseases, where we aim to model HIV infection from the molecular level all the way up to the epidemiological level.

Most notable I will present particle based simulation results that helps us to assess the relevance of HIV mutation rate with respect to the cell-entry and complex network based simulations that help us to understand the demographics of the disease.

back

P.M.A. Sloot, S.V. Ivanov, A.V. Boukhanovsky, D. van de Vijver and C. Boucher

Stochastic Simulation of HIV Population Dynamics through Complex Network Modeling

Abstract:

We propose a new way to model HIV infection spreading through the use of dynamic complex networks. The heterogeneous population of HIV exposure groups is described through a unique network degree probability distribution. The time evolution of the network nodes is modeled by a Markov process and gives insight in HIV disease progression. The results are validated against historical data of AIDS cases in the United States as recorded by the Center of Disease Control. We find a remarkably good correspondence between the number of simulated and registered HIV cases, indicating that our approach to modeling the dynamics of HIV spreading through a sexual network is a valid approach that opens up completely new ways of reasoning about various medication scenarios.This work was done within the Virolab project using the Virolab Virtual Laboratory experimental environment (See www.virolab.org). For more details see: P.M.A. Sloot S. V. Ivanov, A. V. Boukhanovsky, D. van de Vijver and C. Boucher. Stochastic Simulation of HIV Population Dynamics through Complex Network Modeling, Int. J. of Computer Mathematics (in press 2007).

back

Hai Zhuge

The Web Resource Space Model: A New Frontier

Abstract:

The Web Resource Space Model is a Web semantic data model that can effectively organize and manage versatile Web resources regardless of their forms and locations by establishing an appropriate classification and semantic links. Integrating the Web Resource Space Model, database model and Semantic Web standards like OWL could form a powerful semantic platform for the future interconnection environment. This talk introduces the main content of this model.

back

Organizers:

ACC Cyfronet AGH

Institute of Nuclear Physics Polish Academy of Sciences

IFJ PAN

Institute of Computer Science
AGH

Dept. of Bioinf. & Telemedicine,
Medical College, UJ