Creating Standards for Big Data
Introduction
May 15, 2014
At the November, 2013 JTC1 Plenary, JTC1 established a Study Group on Big Data and assigned Wo Change of the USA as the Study Group Conveyor. Following the JTC1 meeting, the INCITS Executive Board created a Big Data Ad Hoc to serve as the USA TAG to the JTC1 Study Group on Big Data. The EB assigned me, Keith Hare to chair the Big Data Ad Hoc
This report provides a quick introduction to Big Data and an overview of the current Big Data standards efforts.
What is Big Data
Big Data is often defined using three (or more) V’s, volume, Velocity, and Variety
- Volume
- Analytics are being run on the whole dataset rather than just a small sample
- The amount of data is larger than larger than can be handled by current database technology (so by definition if we enhance current technology to handle the amount, “big data” will be bigger.)
- Velocity
- The rate of data growth and/or change exceeds what can be handled by current technologies.
- The data may need to be processed/analyzed as an incoming stream before (or instead of) it is stored on a disk
- Variety – Analytics are applied to any combination of:
- Textual data
- Numeric data
- Geographical data
- Images
- Video
- Whatever else
The driving forces behind Big Data include:
- Economical to store very large amounts of data
- Economical to analyze the entire data set, not just a sample
- Ability to integrate analysis of disparate data sources
- Potential for deriving actionable results from the analysis
- Real-time use of results
These driving forces (and lots of hype) have triggered lots of effort in the technologies supporting Big Data.
Big Data Efforts
There are currently four overlapping, but distinct Big Data efforts:
- SC32 Study Group on Next Generation Analytics
- NIST Big Data Public Working Group - Wo Chang
- ISO/IEC JTC1 Big Data Study Group - Wo Chang
- INCITS Ad Hoc on Big Data – Keith Hare
SC32 Study Group – Next Generation Analytics
The SC32 Study Group on Next Generation Analytics was established at the 201 SC32 Plenary in response to a resolution from the 2011 JTC1 Plenary. The SC32 study group is chaired by Keith Hare (me).
The most recent report is SC32 N 2388a “SC32 Study Group on Next Generation Analytics”.
Responding to 2011 JTC1 Resolution
The SC32 NGA study group made progress during the 2012 & 2013 SC32 plenaries, but has made little progress in between plenaries.
Parts of SC32 N 2388a have been incorporated in the NIST Big Data Technology Roadmap report.
NIST Big Data Public Working Group
In June, 2013, NIST instituted a Big Data Public Working Group (NIST BDPWG). It has been chaired by:
- Dr. Chaitan Baru, Distinguished Scientist and Associate Director Data Initiatives, San Diego Supercomputer Center
- Dr. Robert Marcus, Chief Technology Officer at ET-Strategies, Co-Chair of Big Data Working Group at Cloud Standards Customer Council
- Mr. Wo Chang, Digital Data Advisor for the NIST Information Technology Laboratory (ITL), Convenor of ISO/IEC JTC/1 SC29 WG11 (MPEG) Multimedia Preservation AHG
- Definition and Taxonomy
- Requirements
- Security & Privacy
- Reference Architecture
- Technology Roadmap
While Chaitan Baru and Bob Marcus have provided substantial input into the NIST BDPWG, Wo Change has been the visible driving force in the effort.
The NIST BDPWG web site is:
http://bigdatawg.nist.gov/home.php
The web site is open to the public.
The NIST BDPWG has operated through weekly Web Conferences/Conference Calls. The efforts have focused on five areas:
The initial versions of the output documents from these five subgroups are available at:
http://bigdatawg.nist.gov/V1_output_docs.php
Revisions and expansions of these reports are currently in process.
ISO/IEC JTC1 Study Group on Big Data
The ISO/IEC JTC1 Study Group on Big Data (JTC1 SGBD) was created by Resolution 27 at the November, 2013 JTC1 Plenary at the request of the USA and other national bodies. USA offered, and JTC1 accepted, Wo Chang as the convenor of the JTC1 SGBD.
The web site for this effort is:
http://jtc1bigdatasg.nist.gov/home.php
It is open to Public. Wo Chang is also uploading the documents to the JTC1 Livelink site.
Wo Chang has scheduled three workshops & meetings in 2014:
- USA – San Diego, California
- Workshop: March 18 – 19
- Study Group Meeting: March 20 – 21
- Europe – Amsterdam
- Workshop: May 13 – 14
- Study Group Meeting: May 15 – 16
- Asia – Beijing China
- Workshop: June 16 – 17
- Study Group Meeting: June 18 – 19
Wo has also been working with Research Data Alliance (RDA, https://rd-alliance.org) to create reference/example implementations.
INCITS Big Data on Ad Hoc
Following the creation of the JTC1 SGBD, the INCITS Executive Board created the INCITS Big Data Ad Hoc (INCITS BDAH) to serve as the USA TAG to the JTC1 SGBD. The EB appointed Keith Hare (me) to serve as the chair of the INCITS BDAH.
The public web site for this effort is:
https://standards.incits.org/apps/group_public/workgroup.php?wg_abbrev=big-data
It is restricted to participants in the BDAH.
The INCITS BDAH currently has about 40 participants from 29 companies. Many of these participants (if not their companies) have not previously been involved in INCITS standards development groups.
Upcoming Meetings
At the March 18-21 JTC1 SGBD meeting in San Diego, we held an informal meeting of the INCITS BDAH where I talked in general about how we would operate. I also attempted to recruit all USA attendees who were not yet participating in the INCITS BDAH.
I am scheduling the following INCITS BDAH meetings using the INCITS web conference facility:
NIST Big Data Ad Hoc |
JTC1 Study Group on Big Data |
Thursday, May 1, 2-4 PM EDT
|
|
Thursday, May 8, 2-4 PM EDT
|
|
May 13-16, 2014 – Amsterdam |
|
Thursday, June 5, 2-4PM EDT
|
|
Thursday, June 12, 2-4 PM EDT
|
|
June 16-19, 2014 – Beijing |
I expect some additional coference calls but do not yet have likely dates.
Standardization Effort Outcomes
The March JTC1 SGBD meeting created an ad hoc group of editors to incorporate discussions into a preliminary draft report.
The structure of this report is based on outline submitted by Korea (JTC 1 SGBD N0007 “KNB Proposal on the template of SGBD report to JTC 1 Plenary”). It includes concepts and definitions adopted from the some of the NIST BDPWG output documents.
This draft of the report will serve as input to the May JTC1 SGBD.
The final report to JTC1 2014 Plenary should be completed by September 30, 2014. It will contain recommendations for a variety of areas in which existing standards can be expanded and new standards created to support Big Data.