Sponsored by IBM Microsoft Oracle


DanaC: Workshop on Data analytics in the Cloud

Data analytics has the potential to be a transformer of scientific research, and data-driven business decisions. By effectively analyzing huge volumes of data, scientific research can be transformed from hypothesis-driven to data-driven, where forming scientific hypotheses will be aided by discovering patterns in vast quantities of data. For most technology companies that operate on a Web scale, analyzing customer data can provide insights on customer behavior, and lead to answers for critical business decisions.

Cloud computing has emerged as a cost-effective and elastic computing paradigm. Cloud infrastructures scale to massive numbers of commodity computing nodes and provide adaptive provisioning without prohibitive initial investments. Data analytics has the potential to be a significant cloud application, and to constitute a large fraction of the workload of modern data centers. Designing the infrastructures and systems for data management in the new computing environments remains an open challenge.

 

Topics of Interest

Areas of particular interest for the workshop include (but are not limited to):


 

Program

Find the workshop proceedings here.

 

Program

9:00 - 10:30 (Session 1)


10:30 - 10:45


Coffee break

10:45 - 12:15 (Session 2)


ScyPer: Elastic OLAP Throughput on Transactional Data. Tobias Mühlbauer (TUM), Wolf Rödiger (TUM), Angelika Reiser (TUM), Alfons Kemper (TUM), Thomas Neumann (TUM).

Scalable I/O-Bound Parallel Incremental Gradient Descent for Big Data Analytics in GLADE. Chengjie Qin (UC Merced), Florin Rusu (UC Merced).

Towards a Workload for Evolutionary Analytics.Jeff LeFevre (UC Santa Cruz), Jagan Sankaranarayanan (NEC Labs America), Hakan Hacigumus (NEC Labs America), Junichi Tatemura (NEC Labs America), Neoklis Polyzotis (UC Santa Cruz).

Don't Match Twice: Redundancy-free Similarity Computation with MapReduce. Lars Kolb (U. Leipzig), Andreas Thor (U. Leipzig), Erhard Rahm (U. Leipzig).

12:15 - 13:45


Lunch

13:45 - 15:15 (Session 3)


A Vision for Personalized Service Level Agreements in the Cloud. Jennifer Ortiz (U. Washington), Victor de Almeida (U. Washington), Magdalena Balazinska (U. Washington).

Multi-objective optimization of data flows in a multi-cloud environment. Efthymia Tsamoura (AUTH), Anastasios Gounaris (AUTH), Kostas Tsichlas (AUTH).

GPText: Greenplum Parallel Statistical Text Analysis Framework. Kun Li (U. Florida), Christan Grant (U. Florida), Daisy Zhe Wang (U. Florida), Sunny Khatri (EMC), George Chitouras (EMC).

Enabling Secure Query Processing in the Cloud using Fully Homomorphic Encryption. Murali Mani (University of Michigan, Flint).

A Case For Dynamic Memory Partitioning in Data Centers. Daniel Warneke (ICSI Berkeley), Christof Leng (ICSI Berkeley).

15:15 - 15:30


Break

15:30 - 17:00 (Session 4)


Panel discussion: "What will be the 'SQL' of 'Big Data NoSQL' systems?"

Daniel Abadi (Yale), Shivnath Babu (Duke), Fatma Ozcan (IBM Almaden), Jeffrey Ullman (Stanford), Till Westmann (Oracle), Jingren Zhou (Microsoft)

Moderator: Volker Markl (TU Berlin)

Big data analytics has given rise a new class of data management systems, e.g., Graphlab, Spark, map/reduce (Hadoop), Asterix, Stratosphere, and others. These systems have introduced novel query or data analysis languages, all of which have the goal to support data analysis applications that go beyond selection, aggregration, or relational queries, most notably enabling machine learning algorithms, graph mining, text mining, or mathematical optimization. We currently see a confusion with respect to data programming languages of babylonic proportions, with a lack of agreement on a common model and query processing language. In particular, some parts of the community seem to be running in circles, with some protagonists of the NoSQL movement implementing subsets of SQL or XQuery on top of Hadoop (e.g., Pig, Hive, JAQL). However, a standardized language could be a key factor for market growth and future mainstream success of these systems beyond niche solutions.

17:00 - ...


Social event at the Long Room (very close to the conference hotel)

 

Accepted Papers

  1. Lars Kolb (U. Leipzig), Andreas Thor (U. Leipzig), Erhard Rahm (U. Leipzig). Don't Match Twice: Redundancy-free Similarity Computation with MapReduce.
  2. Efthymia Tsamoura (AUTH), Anastasios Gounaris (AUTH), Kostas Tsichlas (AUTH). Multi-objective optimization of data flows in a multi-cloud environment .
  3. Tobias Mühlbauer (TUM), Wolf Rödiger (TUM), Angelika Reiser (TUM), Alfons Kemper (TUM), Thomas Neumann (TUM). ScyPer: Elastic OLAP Throughput on Transactional Data.
  4. Chengjie Qin (UC Merced), Florin Rusu (UC Merced). Scalable I/O-Bound Parallel Incremental Gradient Descent for Big Data Analytics in GLADE.
  5. Jennifer Ortiz (U. Washington), Victor de Almeida (U. Washington), Magdalena Balazinska (U. Washington). A Vision for Personalized Service Level Agreements in the Cloud.
  6. Jeff LeFevre (UC Santa Cruz), Jagan Sankaranarayanan (NEC Labs America), Hakan Hacigumus (NEC Labs America), Junichi Tatemura (NEC Labs America), Neoklis Polyzotis (UC Santa Cruz). Towards a Workload for Evolutionary Analytics.
  7. Kun Li (U. Florida), Christan Grant (U. Florida), Daisy Zhe Wang (U. Florida), Sunny Khatri (EMC), George Chitouras (EMC). GPText: Greenplum Parallel Statistical Text Analysis Framework.
  8. Murali Mani (University of Michigan, Flint). Enabling Secure Query Processing in the Cloud using Fully Homomorphic Encryption.
  9. Daniel Warneke (ICSI Berkeley), Christof Leng (ICSI Berkeley). A Case For Dynamic Memory Partitioning in Data Centers.


 

Paper Submission

All papers should be submitted in pdf and formatted using the double-column ACM format (templates are available here).

The workshop solicits:

All papers should clearly mark their type (research/vision/industrial) in the paper title.

Papers should be submitted using the conference management system: https://cmt.research.microsoft.com/DANAC2013


 

Important Dates

Submission deadline: March 29, 2013 April 5, 2013
Notification of acceptance: April 26, 2013
Final papers due: May 10May 17, 2013
Workshop: June 23, 2013

 

Camera-ready Instructions

Please remove the research/vision/industry qualifier from your paper title (that is, unless it is part of the title sentence)

Length: All submitted papers must be formatted according to the instructions below, and must be no more than 5 pages in length. This page limit includes all parts of the paper: title, abstract, body, bibliography, and appendices.

File type: Each paper is to be submitted as a single PDF file, formatted for 8.5" x 11" paper and no more than 5 MB in file size. (Larger files will be rejected by the submission site.)

Formatting: Papers must follow the ACM Proceedings Format, using one of the templates provided here for Word and LaTeX (version 2e). (For LaTeX, both Option 1 and Option 2 are acceptable.) The font size, margins, inter-column spacing, and line spacing in the templates must be kept unchanged.

Authors should apply ACM Computing Classification categories and terms. The templates provide space for this indexing and point authors to the Computing Classification Scheme.

The CR version must also include a copyright statement at the bottom of the first page, left column. ACM will contact authors to complete a rights management form and will subsequently provide the appropriate statement. Please contact the chairs of the workshop if you do not hear from ACM about the rights management form.

All fonts MUST be embedded within the PDF file. Any PDF that is not deposited with fonts embedded will need to be corrected. In order to help you through this process, ACM has created documentation on how to embed your fonts. Please download the ACM Digital Library optimal distiller settings file, ACM.joboptions. ACM cannot substitute font types, though. This really must be done in the source files before the Postscript or PDF is generated. If bit-mapped fonts are used, they will not necessarily display legibly in all PDF readers on all platforms, though they will print out fine. The camera-ready version (in PDF) should be submitted on-line through DanaC 2013's CMT paper submission site.

 

People

PC chairs:
Shivnath Babu
Kostas Tzoumas

Steering Committee:
Magdalena Balazinska
Michael J. Carey
Tim Kraska
Volker Markl

Program Committee:
Michael Armbrust (Google, USA)
Yanpei Chen (Cloudera, USA)
Vuk Ercegovac (IBM Almaden, USA)
Shenoda Guirguis (Intel, USA)
Hakan Hacigumus (NEC Labs, USA)
Donald Kossmann (ETH Zurich, Switzerland)
Jignesh Patel (University of Wisconsin – Madison, USA)
Christopher Re (University of Wisconsin – Madison, USA)
Russell Sears (Microsoft, USA)
Ion Stoica (UC Berkeley, USA)
Philipp Unterbrunner (Oracle Labs, USA)
Florian Waas (EMC, USA)

 

Previous workshops

DanaC 2012: www.danac.org/2012/