Latest News

Introducing a Unique Technology to Scan and Interrogate Petabyte-Scale Unstructured Data Lakes

According to IDC, there will be about 175 zettabytes (ZB) of data worldwide by 2025 compared to 64.2ZB in 2020. Not surprisingly as a result, 95% of businesses cite the need to manage unstructured data as a problem for their business. 

Previously, both of Datadobi’s products, DobiMigrate and DobiProtect, were designed to scan large file systems containing billions of files to help organizations harness the power of unstructured data. Each of these scans produces huge lists of file paths and their metadata in a proprietary format to allow performant and storage-efficient handling, analysis, and comparison of the files to enhance unstructured data management.

Until recently, these scan files were only used for doing data migration or protection. That is changing. 

Over the last several months as the COVID-19 pandemic drove digital transformation and increased the amount of unstructured data within networks, enterprises began asking for access to the scans in order to analyze and reorganize unstructured data lakes. 

For a user to analyze the composition of their unstructured data some serious data reduction and aggregation within that set of billions of files is required. This created the need for a tool to query, aggregate, and reduce the amount of information about the data lake so it is consumable by the IT administrator. 

In order to fill this market need, Datadobi has officially developed Datadobi Query Language (DQL) to enhance the file system assessment service in order to optimize and organize data lakes internally. DQL within the file system assessment service offers complete flexibility around how the software can interrogate the customer data set and enables tremendous data reduction to make it manageable for the customer to handle its multi-petabyte data lake. 

DQL is a query framework that can look for many aspects in a data lake such as: 

  • Identifying cold data sets — data that is infrequently accessed
  • Identifying old data sets —data that was created or modified some time ago
  • Identifying data sets owned by a specific user or group, e.g. by users who no longer work at the company
  • Identifying shares, exports or directories trees that are homogeneous (cold, old, owner, file types) and can be handled as one data set e.g. to take specific lifecycle actions upon

Datadobi created the file system assessment offering last year to fill the need for users before they plan a data migration or reorganization. DQL is now an essential part of the file system assessment service because it enables assessments to be customizable. Using the pre-migration service enhanced with DQL, customers can learn to understand what’s on their storage system, and based on the partitioning of their system in data sets, make a plan of what to migrate where. 

On a similar note, DQL is an essential part of Datadobi’s vendor-neutral data mobility engine. DQL sits within the engine technology to scan file systems, move data, analyze the file metadata of large data lakes, and simplify how IT administrators can look at their data and identify logical subsets of data. 

The volume of data is only expected to grow over the next few years. IT administrators need a data management solution that can transform data into digestible material in order to allow curated decisions on storage options for migration and protection to be made.