The Buzz Around Data Lakes: Better mining and output without limitations

By Karthik Palanisamy, Vice President, Analytics and SAP Services

 

Karthik's specialties include ERP strategy, business intelligence, HANA, and leveraging technology.

Learn about key attributes and considerations before jumping in!

Data Lakes are the latest BI buzz that many CIOs are asking about. It’s the buzzword for a new way to collect, combine and summarize data from various sources. The key benefit: better data mining to gain a more desirable output. The use of Data Lakes can help businesses get around big data without the limitations of structures, as with traditional data warehouses.

Key Attributes of Data Lakes

  • Data collection. With Data Lakes, all types of data (raw and processed) related to the market or the universe as defined by your organization is collected at the most granular level. This provides enriched data for exploration. Benefit: improved decision-making ability.
  • Expanded data mining capability. This data is available to be sliced and diced or combined, at the most granular level for enhanced insights. Benefit: Gain more insights by seeing, for example, the cause-and-effect relationships between different data sets.
  • Flexible access. Access to Data Lakes can be shared across shared infrastructure. This enables visibility to the same data set between various functions of the organization, such as marketing, sales, finance etc.

Data Lakes Considerations

Before jumping into Data Lakes, here are some considerations for SAP HANA, Hadoop or a combination of approaches:

  • Business Driver: Does the availability and mining of enriched data actually provide insights that drive business value and growth?
  • Overall cost needs to include infrastructure, storage, implementation and support. Remember: software may be free but monthly support is still required.
  • Ensure every component is covered. Example: certain capabilities might need separate support versus the standard support available for Hadoop.
  • Sizing is very critical. A clearly defined business case for new or enhanced data will drive the granularity and volume of data. Example: 16 nodes of Hadoop is usually a good number to start with. But, deep analytics (Data Science/statistical modeling) and number of iterations will actually dictate compute power needed.
  • Consider data volume and type of analytics, plus needs for analytics or data mining. Just because there is a large volume of data, or you can mine and combine that data, does not necessarily mean ROI. What insights and business benefits are provided?

Want to get started? Let’s talk! Send me an email and we’ll discuss the best approach for your business.