- Hands-On Data Science with SQL Server 2017
- Marek Chmel Vladimír Mu?n?
- 252字
- 2021-06-10 19:14:02
External data with PolyBase
With data acquisition, we frequently face situations when data is not available in the SQL Server, and for our analysis, we usually import or query data from various other database platforms or other systems. SQL Server 2016 has introduced a new feature called PolyBase, which can help us with accessing external data from the SQL Server. PolyBase is able to access Hadoop-type file systems to query external data and to push the computation to Hadoop so that the SQL Server does not get overloaded while accessing large amounts of data.
The great benefit of PolyBase is the unification of two very different worlds: structured data and unstructured data. Hadoop is a collection of open source utilities, which includes a distributed file system called hdfs. This data distribution is a challenge for data analysis, since the data is distributed and located in heterogeneous systems, which makes it very difficult to access and process from SQL Server. PolyBase allows you to interact between structured data, usually our tables in the SQL Server, and unstructured or semi-structured data, stored in the distributed file systems. PolyBase is not completely new to SQL Server; it was available as a component of the Parallel Data Warehouse from the Analytic Platform System tool, and it's just now been built into the SQL Server.
PolyBase is a feature that can be used to do the following:
- Query data stored in Hadoop
- Import data from Hadoop
- Query data stored in Azure blob storage
- Export data
