Glossary

This glossary provides information around the services, technologies and product of the Semantic Web Company.

Data Lake

Synonyms: Smart Data Lake

A data lake is a method of storing data within a system or repository, in its natural format, that facilitates the collocation of data in various schemata and structural forms, usually object blobs or files. The idea of data lake is to have a single store of all data in the enterprise ranging from raw data (which implies exact copy of source system data) to transformed data which is used for various tasks including reporting, visualization, analytics and machine learning. The data lake includes structured data from relational databases (rows and columns), semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs) and even binary data (images, audio, video) thus creating a centralized data store accommodating all forms of data. One example of a data lake is the distributed file system used in Apache Hadoop.(Source: Wikipedia.org)