- Hands-On Data Science with SQL Server 2017
- Marek Chmel Vladimír Mu?n?
- 329字
- 2021-06-10 19:13:55
SQL Server and big data
Let's face reality. SQL Server is not a big-data system. However, there's a feature on the SQL Server that allows us to interact with other big-data systems, which are deployed in the enterprise. This is huge!
This allows us to use the traditional relational data on the SQL Server and combine it with the results from the big-data systems directly or even run the queries towards the big-data systems from the SQL Server. The answer to this problem is a technology called PolyBase:
PolyBase is a bridge between SQL Server and big-data systems such as Hadoop, which can run in numerous different configurations. You can have your own Hadoop deployment, or utilize some Azure services such as HDInsight or Azure Data Lake, which are implementations of Hadoop and HDFS filesystem from the Hadoop framework. We'll get deeper into PolyBase in Chapter 4, Data Sources for Analytics. If you would like to test drive Hadoop with SQL Server, there are several appliances ready for testing and evaluation, such as Hortonworks Data Platform or Cloudera.
For Cloudera Quickstart VMs, you can check out https://www.cloudera.com/downloads/quickstart_vms/5-13.html
Hadoop itself is external to SQL Server and is described as a collection of software tools for distributed storage and the processing of big data. The base Apache Hadoop framework is composed of the following modules:
- Hadoop Common: Contains libraries and utilities needed by other Hadoop modules
- Hadoop Distributed File System (HDFS): A distributed filesystem that stores data on commodity machines, providing very high aggregate bandwidth across the cluster
- Hadoop YARN: Introduced in 2012 as a platform responsible for managing computing resources in clusters and using them for scheduling users' applications
- Hadoop MapReduce: An implementation of the MapReduce programming model for large-scale data processing:
- Introduction to DevOps with Kubernetes
- 數(shù)據(jù)運營之路:掘金數(shù)據(jù)化時代
- 大數(shù)據(jù)處理平臺
- 21天學(xué)通Java
- 21天學(xué)通Visual Basic
- C語言開發(fā)技術(shù)詳解
- Learning C for Arduino
- Learn CloudFormation
- Docker on Amazon Web Services
- Salesforce for Beginners
- Mastering pfSense
- 水晶石影視動畫精粹:After Effects & Nuke 影視后期合成
- Visual Studio 2010 (C#) Windows數(shù)據(jù)庫項目開發(fā)
- 基于RPA技術(shù)財務(wù)機(jī)器人的應(yīng)用與研究
- 智能制造系統(tǒng)及關(guān)鍵使能技術(shù)