Business Intelligence for the Real Time Enterprise

August 27, 2012 - Istanbul, Turkey


Real-Time Reporting at Salesforce.
Donovan Schneider (Salesforce)
Abstract runs millions of reports every day on its scalable multi-tenant cloud platform. This talk presents the reporting architecture, query processing and optimization strategies, and other considerations for real-time reporting.

Real-time Business Intelligence in the MIRABEL Smart Grid System.
Ulrike Fischer (TU Dresden), Dalia Kaulakien? (AAU), Mohamed Khalefa (AAU), Wolfgang Lehner (TU Dresden), Torben Pedersen (AAU), Laurynas Siksnys(AAU), Christian Thomsen (AAU)
The so-called smart grid is emerging in the energy domain as a solution to provide a stable, efficient and sustainable energy supply accommodating ever growing amounts of renewable energy like wind and solar in the energy production. Smart grid systems are highly distributed, manage large amounts of energy related data, and must be able to react rapidly (but intelligently) when conditions change, leading to substantial real-time business intelligence challenges. This paper discusses these challenges and presents data management solutions in the European smart grid project MIRABEL. These solutions include real-time time series forecasting, real-time aggregation of the flexibilities in energy supply and demand, managing subscriptions for forecasted and flexibility data, efficient storage of time series and flexibilities, and real-time analytical query processing spanning past and future (forecasted) data. Experimental studies show that the proposed solutions support important real-time business intelligence tasks in a smart grid system.

Data Mining in Life Sciences Using In-Memory DBMSs: A Case Study On SAPs In-Memory Computing Engine.
Joos-Hendrik Boese, Gennadi Rabinovitch, Matthias Steinbrecher, Miganoush Magarian, Massimiliano Marcon, Cafer Tosun, Vishal Sikka (SAP)
While column-oriented in-memory databases have been primarily designed to support fast OLAP queries and business intelligence applications, their analytical performance makes them a promising plat- form for data mining tasks found in life sciences. One such system is the HANA database, SAP's in-memory data management solution. In this contribution we show how HANA meets some inherent requirements of data mining in life sciences. Furthermore, we conducted a case-study in the area of proteomics research. As part of this study we implemented a proteomics analysis pipeline in HANA. We also implemented a flexible data analysis toolbox that can be used by life sciences researchers to easily design and evaluate their analysis models.

The Vivification Problem in Real-Time Business Intelligence.
Patricia Arocena, Renee Miller, John Mylopoulos (University of Toronto)
In the new era of Business Intelligence (BI) technology, transforming massive amounts of data into high-quality business information is essential. To achieve this, two non-overlapping worlds need to be aligned: the Information Technology (IT) world, represented by an organization's operational data sources and the technologies that manage them (data warehouses, schemas, queries, ...), and the business world, portrayed by business plans, strategies and goals that an organization aspires to fulfill. Alignment in this context means mapping business queries into BI queries, and interpreting the data retrieved from the BI query in business terms. We call the creation of this interpretation the vivification problem. The main thesis of this position paper is that solutions to the vivification problem should be based on a formal framework that explicates assumptions and the other ingredients (schemas, queries, etc.) that affect it. Also, that there should be a correctness condition that explicates when a response to a business schema query is correct. The paper defines the vivification problem in detail and sketches approaches towards a solution.

On-Demand ETL Architecture for Right-Time BI".
Florian Waas (EMC/Greenplumb)
In a typical BI infrastructure, data, extracted from operational data sources, is transformed and cleansed, and subsequently loaded into a data warehouse where it can be queried for reporting purposes. ETL - the process of extraction, transformation, and loading, is a periodic process that may involve an elaborate and rather established software ecosystem. Typically, the actual ETL process is executed on a nightly basis, i.e., a full day's worth of data is processed and loaded during off-hours. Depending on the resources available and the nature of the data and the reporting, ETL may also be performed more frequently, e.g., on an hourly basis. It is desirable to reduce this delay further and ideally provide reports and business insights at real-time or near real-time. However, this requires overcoming throughput bottlenecks and improving latency throughout the ETL process. Instead of attempting to incrementally improve this situation, we propose a radically different approach: leveraging a data warehouse's capability to directly import raw, unprocessed records, we defer the transformation and cleaning of data until needed by pending reports. At that time, the database's own processing mechanisms can be deployed to process the data on-demand. In this talk, we will look at the challenges that come with such an on-line approach as well as the opportunities for optimization. In particular, we explore how to leverage a sophisticated hierarchy of materialized views to build a continuous load architecture.

Instant-On Scientific Data Warehouses ---Lazy ETL for Data-Intensive Research.
Yagiz Kargin, Holger Pirk, Milena Ivanova, Stefan Manegold, and Martin Kersten (CWI, Netherlands)
In the dawning era of data intensive research, scientific discovery deploys data analysis techniques similar to those that drive business intelligence. Similar to classical Extract, Transform and Load (ETL) processes, data has to be loaded entirely from external data sources (repositories) into a scientific data warehouse before it can be analyzed. This process is both, time and resource intensive and may be an overkill if only a subset of the data is of interest to a particular user. To overcome this problem, we propose a novel technique to lower the costs for data loading: Lazy ETL. Data is extracted and loaded transparently on-the-fly only for the required data items. Extensive experiments demonstrate the significant reduction of the time from source data availability to query answer compared to state-of-the-art solutions. In addition to reducing the costs for bootstrapping a scientific data warehouse, our approach also lessens the costs for loading new incoming data.

Query Processing of Pre-Partitioned Data Using Sandwich Operators.
Stephan Baumann (TU Ilmenau), Peter Boncz (CWI, Netherlands), Kai-Uwe Sattler (TU Ilmenau,)
In this paper we present the "Sandwich Operators", an elegant approach to exploit pre-sorting or pre-grouping from clustered storage schemes in operators such as Aggregation/Grouping, HashJoin, and Sort of a database management system. Thereby, each of these operator types is "sandwiched" by two new operators, namely PartitionSplit and PartitionRestart. PartitionSplit splits the input relation into its smaller independent groups on which the sandwiched operator is executed. After a group is processed, PartitionRestart is used to trigger the execution on the following group. Executing one of these operator types with the help of the Sandwich Operators introduces minimal overhead and does not penalize performance of the sandwiched operator, as its implementation remains unchanged. On the contrary, we show that sandwiched execution of an operator results in lower memory consumption and faster execution time. PartitionSplit and PartitionRestart replace special implementations of partitioned versions of these operator. Sandwich Operators also turn blocking operators into streaming operators, resulting in faster response times for the first query results.

Towards multi-modal extraction and summarization of conversations.
Raymond Ng (Univ. British Columnbia, Canada)
For many business intelligence applications, decision making depends critically on the information contained in all forms of "informal" text documents, such as emails, meeting summaries, attachments and web documents. For example, in a meeting, the topic of developing a new product was first raised. In subsequent follow-up emails, additional comments and discussions were added, which included links to web documents describing similar products in the market and user reviews on those products. A concise summary of this "conversation" is obviously valuable. However, existing technologies are inadequate in at least two fundamental ways. First, extracting "conversations" embedded in multi-genre documents is very challenging. Second, applying existing multi-document summarization techniques, where were designed mainly for formal documents, have proved to be highly ineffective when applied to informal documents like emails. In this presentation, we give an overview of email summarization and meeting summarization methods. We introduce open problems that need to be solved for multi-modal extraction and summarization of conversations to become a reality, and to be conducted in real time. We also position our project within a pan-Canadian network on business intelligence.

Life Analytics Service Platform.
Meichun Hsu (HP Labs, USA)
This talk presents our vision of a platform for real-time analytics services that is being developed at HP Labs in close interaction with business units. The Live Analytics Service Platform leverages data management technology and fuses it with new paradigms for analytics and application development. We cover important considerations, challenges, functionalities and the architecture of the platform and illustrate its value in a few applications.

Strategic Management for Real-Time Business Intelligence.
Konstantinos Zoumpatianos, Themis Palpanas, John Mylopoulos (U.Trento, Italy)
Even though much research has been devoted on real-time data warehousing, most of it ignores business concerns that underlie all uses of such data. The complete Business Intelligence (BI) problem begins with modeling and analysis of business objectives and specifications, followed by a systematic derivation of real-time BI queries on warehouse data. In this position paper, we motivate the need for the development of a complete Real Time BI stack able to continuously evaluate and reason about strategic objectives. We argue that an integrated system, able to receive formal specifications of the organization's strategic objectives and to transform them into a set of queries that are continuously evaluated against the warehouse, offers significant benefits. In this context, we propose the development of a set of real-time query answering mechanisms able to identify warehouse segments with temporal patterns of special interest, as well as novel techniques for mining warehouse regions that represent expected, or unexpected threats and opportunities. With such a vision in mind, we propose an architecture for such a framework, and discuss relevant challenges and research directions.

Pricing Approaches for Data Markets.
Alexander Löser (TU Berlin), Florian Stahl (U.of Münster), Alexander Muschalle (TU Berlin), Gottfried Vossen (U.of Münster)
Most recently, multiple data vendors utilize the paradigm of cloud-computing for trading raw data, associated analytical services and analytic results as a commodity good. We observe that these vendors often move the functionality of a data warehouse to a cloud based platform. In this platform the vendors provide services for integrating and analyzing data from public and commercial data sources. We asked established vendors about their key challenges when deciding pricing strategies in different market situations and discuss interesting associated research problems for the business intelligence community.