The Blog of operational excellence and energy management for industry


Browse our articles, expert advices and clients testimonials: our experience save your time!

Data Lakes: mass storage is boosting energy data analysis

In the age of the Internet of Things (IoT), Big Data and the Cloud, low-cost aggregation and mass storage of enterprise data is both possible and judicious. Creating a repository for raw data – a data lake – meets enterprises’ new-found need to organise, centralise, manage, use and analyse large amounts of data, while at the same time breaking down the siloed information systems in which enterprise data has traditionally been kept. To make the concept easier to grasp, James Dixon, an American specialist in Business Intelligence, in 2011 likened the Data Lake to “a large body of water in a more natural state” in which we can “dive in, or take samples,” by opposition to the Data Mart (1), which is a space for storing selected, structured data: “a store of bottled water – cleansed and packaged and structured for easy consumption”. While the image is evocative, the concept calls for further explanation.

 

1. Definition

2. Benefits

3. Limitations and risks

4. Application to energy performance

 

 

1. Definition

Sometimes seen as a new-generation version of the Data Warehouse, the Data Lake is a space in which large amounts of data, whatever their type or origin, can be stored for an unlimited time and with no strict organisation of the incoming flows. All of an enterprise’s raw data and all of its processed data can therefore coexist within the same Data Lake. The benefits include greater fluidity, agility and interaction, and data that is easier to process, use and analyse. This is why more and more enterprises are opting for a Data Lake, particularly to store their energy data (electricity consumption, power, status, etc.).

 

2. Benefits

1. A Data Lake makes it possible to collect and store all of the enterprise’s raw data in the same place and in real time. This flexibility is its main advantage.

2. But, in addition, the lack of any obligatory structuring of the data means that the full potential of the source information remains intact. Users can extract native data for cross-comparison and use it for analytical purposes, either now or any time in the future.

3. In industry, the Data Lake is a major step forward because it allows the data from all of a factory’s sensors to be restored, in real time, in a single database. This in turn means that line-of-business applications can interact rapidly with the Data Lake.

4. Data Lakes’ capacity to collect massive amounts of data, combined with the computational power now available, make it possible to link data flows to their line-of-business applications and help optimise industrial processes.

5. Data Lakes can also be linked to machine-learning approaches designed to use all of the enterprise data to build predictive models.

 

3. Limitations and risks

The lack of organisation and structuring of the data can sometimes lead to a completely disorganised Data Swamp. To prevent this happening, a Data Lake needs highly technical tools and targeted skills to clearly define the requirements and maintain tighter control over the data to be used.

This is where it becomes essential to define a strategy, so that data can be sorted, rather than collecting large volumes of data unnecessarily, and to focus on selecting useful, valuable data.

 

4. Application to energy performance

The Data Lake is particularly well-suited to the needs of decision-makers involved in industrial energy performance. The data needed to build dashboards comes from devices that are both highly varied (sensors, automatic control systems, machines, manual readings, etc.) and disparate (using different units of measurement, and different or even variable time increments, etc.). Additionally, because energy is a subject that cuts across the entire enterprise, the information needed to create ratios and KPIs comes from line-of-business tools (production, maintenance, energy, quality) and systems (MES, ERP, etc.) whose prescribed forms must be adhered to. Lastly, the drive for continuous improvement inherent in energy performance (PDCA) naturally deems that the information needed in the future will stem from the observations made today, which is why it is absolutely essential to retain the raw data whenever possible. This is what Blu.e has set out to do in its tools.

 

(1) A Data Mart is a constituent of a Data Warehouse.