Italy: Big Data and Science Networks Interview with Dr. Claudio Grandi INFN

Written by Ruth Theus Baldassarre, Embassy of Switzerland in Italy

On October 1st 2015, the NRP 75 “Big data” was launced by the Swiss National Science Foundation (SNSF), it focuses on the technical and societal issues raised by big data. The five-year programme, funded with 25 Million CHF, will undertake research on computing (data analysis, algorithm, cryptology), data management/security and infrastructure. In this context it appears interesting to highlight the experience of the Italian Academic & Research Network GARR[1].

Computing networks

The objective of the GARR network is to provide high-bandwidth connectivity and advanced services not only to the Italian academic community, but also internationally. The Network GARR was established in the late eighties. In 2002 the Consortium GARR[2], a non-profit organisation, was established with the support of the Italian Ministry of Education and Scientific Research. All academic and major scientific organisations in Italy are linked to the GARR network. GARR itself is a member of a large number of international networks of technologically advanced countries, such as the Trans-European Research and Education Networking Association (TERENA); the Réseaux IP Européens (RIPE); the Delivery of Advanced Network Technology to Europe (DANTE) and the Gigabit European Academic Network (GÉANT).

Following the worldwide technological trend, GARR is evolving continuously. In 2009 the GARR-X multi-service network started and since 2013 the project GARR –X Progress has been developing. This project foresees to overcome the still existing digital divide within the different Italian Regions (optical fiber backbone links for over 2500 km and fiber access for a further 1,100km in addition to the existing infrastructure). On the other hand, the project achieved an important breakthrough in international research connectivity: the GARR-X 100 Gbps interconnection to pan-European Multi-Gigabit backbone GÉANT.

GARR-X evolution

Image of the GARR-X evolution

GARR-X Progress Services

The GARR Network counts more than 2 million end users. Organisations of national and international relevance (around 500 user sites) are working with the GARR Network: Universities, Observatories, Laboratories, Scientific Research and Health Care Institutes, Research Organizations, Conservatories, Academies of performing arts, Museums and Libraries.

The GARR Network provides not only the infrastructure of a high-bandwidth fiber-optic backbone, including the last mile access link, but also important services:

  • Network Management (GARR NOC)
    • The GARR Network Operations Centre is dedicated to the infrastructure and the maintenance of the network.
  • Grid and Cloud (IDP IN THE CLOUD)
    • The service automates the implementation of identity management services, combining cloud and configuration management.
  • Network Security (GARR CERT)
    • GARR CERT implements preventive measures to prevent security incidents and assists users to efficiently resolve security incidents and network attacks.
  • Vulnerability Assessment (GARR SCARR)
  • Registration of domains (GARR NIC)
  • Allocation of IP Addresses (GARR LIR)
  • Digital Identity (IDEM GARR AAI)
  • WI-FI and Mobility (EDUROM)
  • Digital Certificates
  • Videoconferencing (GARR VCONF)
  • Software Mirror (GARR MIRROR, efficiency in software distribution)

*****

Interview with Dr. Claudio Grandi, Italian National Institute for Nuclear Physics

The following interview with the Italian National Institute for Nuclear Physics (INFN), one of the most important clients of the GARR network, provides an in-depth look at specific applications of the network and their development. The discussion also brings to the fore the bilateral cooperation between the INFN and Switzerland.

Dr. Claudio Grandi, chairman of the INFN Commission for Computing and Networks, kindly answered the questions posed by the STC in Rome, Ruth Theus Baldassarre.

Q: To begin with, could you share a few words with us on the history of the INFN in conjunction with the GARR project?

A: The need for INFN to rely on computers to analyse its experimental results dates back to the ’60s of the past century. Initially the computers owned by INFN and those available at the regional academic computing centres (CINECA, CILEA, CSATA, etc.) were isolated. The connection among remote systems in production has been available since 1979 when minicomputers located in several INFN sites were interconnected using a dial-up connection. In the following years INFNet was created, based on the novel DECnet technology. The following steps, done under the coordination of INFN-CNAF in Bologna, were the connection to CINECA and of the other academic regional centres, and, in the ’80s, to the international laboratories such as CERN in Geneva, FNAL in the USA and DESY in Hamburg.

GARR (Gruppo per l’Armonizzazione delle Reti della Ricerca, Group for Research Network Harmonization) was born in 1988 to coordinate the creation of a national backbone dedicated to research institutes and universities and in 1989 an infrastructure based on the TDM technology and capable of transferring 2 Mbps was made available. In 2002 the GARR group became an association, Consortium GARR, including initially the universities (CRUI), CNR, ENEA and INFN, and supported by the Ministry of University and Scientific and Technological Research (MURST). Since then GARR guaranteed the operation, extension and development of the Italian research network, providing also the connection to the other international research networks and with commercial networks. The main steps of the GARR network have been GARR-2 (~1995), GARR-B (1998), GARR-G (2002), GARR-X (2012). GARR-X Progress (2013) received dedicated funds to improve the connection in the South of Italy, including also secondary schools. In the next years GARR intends to upgrade the GARR-X network to the same technology of GARR-X Progress, providing connections at multiples of 100 Gbps to the centres that require them.

INFN has played a primary role in the creation and evolution of GARR. Most of the GARR technical management staff is actually coming from INFN.

Q: At present, what are the INFN’s main services or activities with regard to the GARR Network?

A: INFN is currently the main user of the GARR network representing over 60% of the total GARR traffic. In practice we can say that INFN has driven the evolution of the GARR infrastructure.

In the past decades INFN has built a hierarchy of computing centres addressing the needs of its experiments and in particular of the experiments at the CERN Large Hadron Collider in Geneva. The INFN infrastructure includes a first level centre (Tier-1) at INFN-CNAF in Bologna; 9 second level centres (Tier-2) in Torino, Milano, Laboratori Nazionali di Legnaro/Padua, Pisa, Roma, Laboratori Nazionali di Frascati, Naples, Bari and Catania; several third level centres (Tier-3) at most INFN sites and at the other INFN laboratories, Laboratori Nazionali del Sud and Laboratori Nazionali del Gran Sasso, for the needs of the locally hosted experiments. The total CPU power officially pledged to INFN experiments is over 300 kHS06 (corresponding to more than 30,000 cores), more than 40 PB of disk storage and about 36 PB of tape storage. The Tier-1 represents approximately a half of the total CPU and disk resources and all of the tape resources.

The centres offer their resources mainly through a software technology known as Computing Grid, which hides the complexity of the underlying infrastructure to the final user, providing transparent access to experimental data and resources. A key requirement is that the nodes of the Computing Grid are well interconnected to allow seamless replication of data and fast access to data from the computing nodes. All of the INFN Tier-2s are connected to 10 Gbps or multiples, and the CNAF Tier-1 is currently connected at 60 Gbps to the GARR backbone.

GARR also manages the Italian Federation for Authentication and Authorization for universities and research institutes (IDEM), used by INFN.

Q: Who owns the INFN’s Big Data in the GARR Network, and who has access to it?

A: The data are owned by the collaborations that produced them. Access is guaranteed to members of the collaborations according to the rules that they define. Authentication and authorization for data access and resource usage are controlled by the Grid software. Grid authentication is based on X.509 certificates and authorization is implemented through VOMS, a technology developed by INFN and others that enables collaborations to have a fine-grained control over data and resource access. A world-wide federation has been built to provide mutual trust for all Grid users.

Q: Is there a project for an Open Data policy of the INFN and/or the Italian Government?

A: Policies for data access are decided by the collaborations that produced them. In principle all LHC experiments have agreed to make their data public but for the very low level data (raw data) that are too complex to be analysed without specific knowledge of the detector. We have to consider that there are technical challenges though. Processed data in a format that still retains the capability to be useful for analysis are still complex and require specific software to be accessed and processed. Unfortunately this software becomes obsolete as hardware and operating systems of computers evolve. There are several projects that are trying to address this, for example using the technique of virtualization, and INFN researchers are involved in addressing technical aspects of the problem.

INFN recently started a project to support open access to data and publications providing an Open Access Repository (OAR) compliant with de jure and de facto standards and deploying services to provide PID/DOI identifiers to tag publications, data and software. OAR is registered as a data provider with the main open access directories (OAI, OpenAIRE, OpenDOAR, …). Access is guaranteed through IDEM services to members of the Italian and international Identity Federations. INFN is building agreements with other Italian research institutes to federate repositories and share the development efforts.

INFN represents Italy in the SCOAP3 initiative (http://scoap3.org/).

Q: The INFN collaborates closely with the CERN in Geneva: what are INFN’s main activities in this case? Does the Institute have other partners in Switzerland?

A: Many INFN experiments are based at CERN. In particular the LHC experiments have driven the development and deployment of a computing infrastructure suitable for processing the unprecedented amount of data produced. CERN, together with the main funding agencies of the LHC experiments, has created the Worldwide LHC Computing Grid (WLCG) to coordinate the activities on the computing infrastructure for LHC. As corollaries to the WLCG activities, several European projects have seen CERN and INFN working together to build the bricks of the infrastructure, starting from the European Data Grid project (2001-2004), the three EGEE projects (2004-2010), the more recent EGI projects (since 2010) and many others.

Another important Swiss partner in several of these projects has been SWITCH, especially for the aspects of security management. SWITCH is also a partner of GARR in the coordination of the European National Research and Education Networks (NREN) and in the GÉANT project.

Q: Over the next five years Switzerland will invest 25 Million CHF in the National Research Program “Big data”. What expectations are there of the INFN regarding future bilateral collaboration?

A: The quantity of data that the High Energy Physics experiments are going to produce in the next decade is still very large also in comparison to what companies like Google or Facebook produce. Just to give a scale, LHC phase 2 will produce about 400 PB per year of raw data, the new astrophysics experiment SKA is foreseen to produce 660 PB per year while the total amount of Google search data is about 100 PB per year and all the Facebook uploads are about 180 PB per year. While for the management of computing power the solution that is emerging, Cloud Computing, looks satisfactory for our needs, for the management of storage resources there is still a lot that must be done. We need to build new collaborations like Indigo-DataCloud, which sees INFN, CERN and other partners together to adapt Cloud Computing to scientific needs, to address specifically the needs of distributed storage of large amounts of data. I think that in this field there is space for new collaborations.

[1] GARR: Gruppo di Armonizzazione delle Reti di Ricerca (Italian academic and research telecomunication network).

The pictures in this paper are reproduced with the kind permission of the Consortium GARR.

[2] The founding members are the National Institute of Nuclear Physics (INFN), the National Research Council (CNR); the Italian National Agency for New Technologies, Energy and Sustainable Economic Development (ENEA); the Conference of Italian University Chancellors (Fondazione CRUI).

*****

Follow-up Global Statement 2014 “Digital Education and Economic Opportunities: The Humanities as a Digital Science” (Italy):

International Conference “The Great Beauty of Big Data: Frontiers in the Fruition of the Archival Heritage for Scholars and Society”

November 26, 2015 | 09:00 – 16:30

Rome, Archivio di Stato di Roma, S. Ivo alla Sapienza

Information

Ruth Theus Baldassarre, STC Embassy of Switzerland, Rome.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s