Home
Partners
Research interests
Projects
People
Opportunities (tesi di laurea IT)
Publications
Software
Events
Publications by year:
Publications by type: Conference papers - Journals - Technical Reports - Books - Book Chapters - Phd Thesis - Master Thesis (Tesi di Laurea Magistrale) - Bachelor Thesis (Tesi di Laurea)
Publications by author:
Publications by project:
Free search: Search


2014

L. Aniello, L. Querzoni, R. Baldoni
High Frequency Batch-oriented Computations over Large Sliding Time Windows

Future Generation Computer Systems (to appear), 2014

Abstract [+]

"Today’s business workflows are very likely to include batch computations that periodically analyze subsets of data within specific time ranges to provide strategic information for stakeholders and other interested parties. The frequency of these batch computations provides an effective measure of data analytics freshness available to decision makers. Nevertheless, the typical amounts of data to elaborate in a batch are so large that a computation can take very long. Considering that usually a new batch starts when the previous one has completed, the frequency of such batches can thus be very low. In this paper we propose a model for batch processing based on overlapping sliding time windows that allows to increase the frequency of batches. The model is well suited to scenarios (e.g., financial, security etc) characterized by large data volumes, observation windows in the order of hours (or days) and frequent updates (order of seconds). The model introduces multiple metrics whose aim is reducing the latency between the end of a computation time window and the availability of results, increasing thus the frequency of the batches. These metrics specifically take into account the organization of input data to minimize its impact on such latency. The model is then instantiated on the well-known Hadoop platform, a batch processing engine based on the MapReduce paradigm, and a set of strategies for efficiently arranging input data is described and evaluated."

Downloads:bib - BibTeX reference



F. Petroni, L. Querzoni
GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Completion via Graph Partitioning

(To appear) In Proceedings of the 8th ACM Recommender Systems Conference (RecSys), 2014

Abstract [+]

"Matrix completion latent factors models are known to be an effective method to build recommender systems. Currently, stochastic gradient descent (SGD) is considered one of the best latent factor-based algorithm for matrix completion. In this paper we discuss GASGD, a distributed asynchronous variant of SGD for large-scale matrix completion, that (i) leverages data partitioning schemes based on graph partitioning techniques, (ii) exploits specific characteristics of the input data and (iii) introduces an explicit parameter to tune synchronization frequency among the computing nodes. We empirically show how, thanks to these features, GASGD achieves a fast convergence rate incurring in smaller communication cost with respect to current asynchronous distributed SGD implementations."

Downloads:bib - BibTeX reference



M. Caruso, M. Mecella, V. Forte, A. Cerocchi, L. Querzoni, R. Baldoni
Energy Management in Smart Spaces through the OPlatform

Proceedings of the 2nd International Workshop on Energy-Aware Systems, Communications and Security (EASyCoSe), 2014

Abstract [+]

"Energy management, and in particular its optimization, is one of the hot trends in the current days, both at the enterprise level (optimization of whole corporate/government buildings) and single-citizens' homes. The current trend is to provide knowledge about the micro(scopic) energy consumption. In our work we developed a platform, named OPlatform, for smart environments able to micro-account energy consumption of devices, at the level of each single power line, which allows at the same time the actuation of devices, thus being also an energy-aware domotic solution. After presenting the system architecture, consisting of a distributed system based on several OMeters (specifically designed hardware devices) and an OBox (an embedded PC hosting the software system), we present a preliminary case study, in which the OPlatform has been adopted in a small office, in order to highlight the concrete possible savings."

Downloads:bib - BibTeX reference



R. Baldoni, A. Cerocchi, C. Ciccotelli, A. Donno, F. Lombardi, L. Montanari
Towards a non-Intrusive Recognition of Anomalous System Behavior in Data Centers

First International Workshop on Reliability and Security Aspects for Critical Infrastructure Protection (ReSA4CI 2014), 2014

Abstract [+]

"In this paper we propose a monitoring system of a data center that is able to infer when the data center is getting into an anomalous behavior by analyzing the power consumption at each server and the data center network traffic. The monitoring system is non-intrusive in the sense that there is no need to install software on the data center servers. The monitoring architecture embeds two Elman Recurrent Networks (RNNs) to predict power consumed by each data center component starting from data center network traffic and viceversa. Results obtained along six mounts of experiments, within a data center, show that the architecture is able to classify anomalous system behaviors and normal ones by analyzing the error between the actual values of power consumption and network traffic and the ones inferred by the two RNNs."

Downloads:pdf - Paper
bib - BibTeX reference



R. Baldoni, F. d'Amore, M. Mecella, D. Ucci
A Software Architecture for Progressive Scanning of On-line Communities

Workshop Proceedings of the 34th International Conference on Distributed Computing Systems (ICDCSW), 2014

Abstract [+]

"We consider a set of on-line communities (e.g., news, blogs, Google groups, Web sites, etc.). The content of a community is continuously updated by users and such updates can be seen by users of other communities. Thus, when creating an update, a user could be influenced by one or more updates creating a semantic causal relationship among updates. This transitively will allow to trace how an information flows across communities. The paper presents a software architecture that progressively scan a set of on-line communities in order to detect such semantic causal relationships. The architecture includes a crawler, a large scale storage, a distributed indexing system and a mining system. The paper mainly focuses on crawling and indexing."

Downloads:pdf - Paper
bib - BibTeX reference



T. Heinze, L. Aniello, L. Querzoni, Z. Jerzak
Tutorial: Cloud-based Data Stream Processing

Proceedings of the 8th ACM International Conference on Distributed Event Based Systems (DEBS), 2014

Abstract [+]

"In this tutorial we present the results of recent research about the cloud enablement of data streaming systems. We illustrate, based on both industrial as well as academic prototypes, new emerging uses cases and research trends. Specifically, we focus on novel approaches for (1) scalability and (2) fault tolerance in large scale distributed streaming systems. In general, new fault tolerance mechanisms strive to be more robust and at the same time introduce less overhead. Novel load balancing approaches focus on elastic scaling over hundreds of instances based on the data and query workload. Finally, we present open challenges for the next generation of cloud-based data stream processing engines."

Downloads:pdf - Paper
pdf - Presentation
bib - BibTeX reference



L. Aniello et al.
Big Data in Critical Infrastructures Security Monitoring: Challenges and Opportunities

Proceedings of the First International Workshop on Real-time Big Data Analytics for Critical Infrastructure Protection (BIG4CIP), 2014

Abstract [+]

"Critical Infrastructures (CIs), such as smart power grids, transport systems, and financial infrastructures, are more and more vulnerable to cyber threats, due to the adoption of commodity computing facilities. Despite the use of several monitoring tools, recent attacks have proven that current defensive mechanisms for CIs are not effective enough against most advanced threats. In this paper we explore the idea of a framework leveraging multiple data sources to improve protection capabilities of CIs. Challenges and opportunities are discussed along three main research directions: i) use of distinct and heterogeneous data sources, ii) monitoring with adaptive granularity, and iii) attack modeling and runtime combination of multiple data analysis techniques."

Downloads:bib - BibTeX reference



F. Petroni, L. Querzoni, R. Beraldi, M. Paolucci
LCBM: Statistics-based Parallel Collaborative Filtering

Proceedings of the 17th International Conference on Business Information Systems (BIS), 2014

Abstract [+]

"In the last ten years, recommendation systems evolved from novelties to powerful business tools, deeply changing the internet industry. Collaborative Filtering (CF) represents today’s a widely adopted strategy to build recommendation engines. The most advanced CF techniques (i.e. those based on matrix factorization) provide high quality results, but may incur prohibitive computational costs when applied to very large data sets. In this paper we present Linear Classifier of Beta distributions Means (LCBM), a novel collaborative filtering algorithm for binary ratings that is (i) inherently parallelizable and (ii) provides results whose quality is on-par with state-of-the-art solutions (iii) at a fraction of the computational cost."

Downloads:pdf - Paper
bib - BibTeX reference



L. Aniello, R. Baldoni, C. Ciccotelli, G. Di Luna, F. Frontali, L. Querzoni
The Overlay Scan Attack: Inferring Topologies of Distributed Pub/Sub Systems through Broker Saturation

Proceedings of the 8th ACM International Conference on Distributed Event Based Systems (DEBS), 2014

Abstract [+]

"While pub/sub communication middleware has become mainstream in many application domains, little has been done to assess its weaknesses from a security standpoint. Complex attacks are usually planned by attackers by carefully analyzing the victim to identify those systems that, if successfully targeted, could provide the most effective result. In this paper we show that some pub/sub middleware are inherently vulnerable to a specific kind of preparatory attack, namely the Overlay Scan Attack, that a malicious user could exploit to infer the internal topology of a system, a sensible information that could be used to plan future attacks. The topology inference is performed by only using the standard primitives provided by the pub/sub middleware and assuming minimal knowledge on the target system. The practicality of this at- tack has been shown both in a simulated environment and through a test performed on a SIENA pub/sub deployment."

Downloads:pdf - Presentation
pdf - Paper
bib - BibTeX reference



L. Aniello, S. Bonomi, F. Lombardi, A. Zelli, R. Baldoni
An Architecture for Automatic Scaling of Replicated Services

To appear in the Proceedings of the 2nd International Conference on NETworked sYStems (NETYS), 2014

Abstract [+]

"Replicated services that allow to scale dynamically can adapt to requests load. Choosing the right number of replicas is fundamental to avoid performance worsening when input spikes occur and to save resources when the load is low. Current mechanisms for automatic scaling are mostly based on fixed thresholds on CPU and memory usage, which are not sufficiently accurate and often entail late countermeasures. We propose Make Your Service Elastic (MYSE), an architecture for automatic scaling of generic replicated services based on queuing models for accurate response time estimation. Requests and service times patterns are analyzed to learn and predict over time their distribution so as to allow for early scaling. A novel heuristic is proposed to avoid the flipping phenomenon. We carried out simulations that show promising results for what concerns the effectiveness of our approach."

Downloads:pdf - Paper
bib - BibTeX reference



L. Aniello
Timely Processing of Big Data in Collaborative Large-Scale Distributed Systems

PhD thesis - Sapienza University of Rome

Abstract [+]

"Today’s Big Data phenomenon, characterized by huge volumes of data produced at very high rates by heterogeneous and geographically dispersed sources, is fostering the employment of large-scale distributed systems in order to leverage parallelism, fault tolerance and locality awareness with the aim of delivering suitable performances. Among the several areas where Big Data is gaining increasing significance, the protection of Critical Infrastructure is one of the most strategic since it impacts on the stability and safety of entire countries. Intrusion detection mechanisms can benefit a lot from novel Big Data technologies because these allow to exploit much more information in order to sharpen the accuracy of threats discovery. A key aspect for increasing even more the amount of data at disposal for detection purposes is the collaboration (meant as information sharing) among distinct actors that share the common goal of maximizing the chances to recognize malicious activities earlier. Indeed, if an agreement can be found to share their data, they all have the possibility to definitely improve their cyber defenses. The abstraction of Semantic Room (SR) allows interested parties to form trusted and contractually regulated federations, the Semantic Rooms, for the sake of secure information sharing and processing. Another crucial point for the effectiveness of cyber protection mechanisms is the timeliness of the detection, because the sooner a threat is identified, the faster proper countermeasures can be put in place so as to confine any damage. Within this context, the contributions reported in this thesis are threefold * As a case study to show how collaboration can enhance the efficacy of security tools, we developed a novel algorithm for the detection of stealthy port scans, named R-SYN (Ranked SYN port scan detection). We implemented it in three distinct technologies, all of them integrated within an SR-compliant architecture that allows for collaboration through information sharing: (i) in a centralized Complex Event Processing (CEP) engine (Esper), (ii) in a framework for distributed event processing (Storm) and (iii) in Agilis, a novel platform for batch-oriented processing which leverages the Hadoop framework and a RAM-based storage for fast data access. Regardless of the employed technology, all the evaluations have shown that increasing the number of participants (that is, increasing the amount of input data at disposal), allows to improve the detection accuracy. The experiments made clear that a distributed approach allows for lower detection latency and for keeping up with higher input throughput, compared with a centralized one. * Distributing the computation over a set of physical nodes introduces the issue of improving the way available resources are assigned to the elaboration tasks to execute, with the aim of minimizing the time the computation takes to complete. We investigated this aspect in Storm by developing two distinct scheduling algorithms, both aimed at decreasing the average elaboration time of the single input event by decreasing the inter-node traffic. Experimental evaluations showed that these two algorithms can improve the performance up to 30%. * Computations in online processing platforms (like Esper and Storm) are run continuously, and the need of refining running computations or adding new computations, together with the need to cope with the variability of the input, requires the possibility to adapt the resource allocation at runtime, which entails a set of additional problems. Among them, the most relevant concern how to cope with incoming data and processing state while the topology is being reconfigured, and the issue of temporary reduced performance. At this aim, we also explored the alternative approach of running the computation periodically on batches of input data: although it involves a performance penalty on the elaboration latency, it allows to eliminate the great complexity of dynamic reconfigurations. We chose Hadoop as batch-oriented processing framework and we developed some strategies specific for dealing with computations based on time windows, which are very likely to be used for pattern recognition purposes, like in the case of intrusion detection. Our evaluations provided a comparison of these strategies and made evident the kind of performance that this approach can provide."

Downloads:pdf - PhD Thesis
bib - BibTeX reference



G. Lodi, L. Aniello, G. Di Luna, R. Baldoni
An Event-based Platform for Collaborative Threats Detection and Monitoring

Information Systems, volume 39, pages 175-195, 2014

Abstract [+]

"Organizations must protect their information systems from a variety of threats. Usually they employ isolated defenses such as firewalls, intrusion detection and fraud monitoring systems, without cooperating with the external world. Organizations belonging to the same markets (e.g., financial organizations, telco providers) typically suffer from the same cyber crimes. Sharing and correlating information could help them in early detecting those crimes and mitigating the damages. The paper introduces the Semantic Room (SR) abstraction which enables the development of collaborative and contractually regulated eventbased platforms, on the top of Internet, where data from different information systems are shared and correlated to detect and timely react to coordinated Internet-based security threats (e.g., port scans, botnet) and frauds. The paper describes the SR life cycle management and, to show the flexibility of the abstraction, it proposes the design, implementation and validation of two SRs. The first SR detects inter-domain port scan attacks, the second monitors frauds performed in Italy. In both cases, we use real data traces for demonstrating the effectiveness of our approach. In the first SR, high detection accuracy and small detection delays are achieved whereas in the second, new fraud evidences and investigation instruments are provided to law enforcement agencies."

Downloads:pdf - Paper
bib - BibTeX reference

Top -


DISCLAIMER - The reports contained in this page are included by the contributing authors as a mechanism to ensure timely dissemination of scholarly/technical information on a non-commerical basis. Copyright and all rights therein are maintained by the authors, despite the fact they have offered this information electronically. It is understood that all individuals copying this information will adhere to the terms/ constraints invoked by each author's copyright. Reports may not be copied for commercial redistribution, republication, or dissemination without the explicit permission of the authors. Sections of some of these reports have been published by IEEE, Springer-Verlag, Kluwer etc. and have Copyright. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the publisher.