- Scala for Data Science
- Pascal Bugnion
- 419字
- 2021-07-23 14:33:09
Chapter 5. Scala and SQL through JDBC
One of data science's raison d'être is the difficulty of manipulating large datasets. Much of the data of interest to a company or research group cannot fit conveniently in a single computer's RAM. Storing the data in a way that is easy to query is therefore a complex problem.
Relational databases have been successful at solving the data storage problem. Originally proposed in 1970 (http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf), the overwhelming majority of databases in active use today are still relational. In that time, the price of RAM per megabyte has decreased by a factor of a hundred million. Similarly, hard drive capacity has increased from tens or hundreds of megabytes to terabytes. It is remarkable that, despite this exponential growth in data storage capacity, the relational model has remained dominant.
Virtually all relational databases are described and queried with variants of SQL (Structured Query Language). With the advent of distributed computing, the position of SQL databases as the de facto data storage standard is being challenged by other types of databases, commonly grouped under the umbrella term NoSQL. Many NoSQL databases are more partition-tolerant than SQL databases: they can be split into several parts residing on different computers. While this author expects that NoSQL databases will become increasingly popular, SQL databases are likely to remain prevalent as a data persistence mechanism; hence, a significant portion of this book is devoted to interacting with SQL from Scala.
While SQL is standardized, most implementations do not follow the full standard. Additionally, most implementations provide extensions to the standard. This means that, while many of the concepts in this book will apply to all SQL backends, the exact syntax will need to be adjusted. We will consider only the MySQL implementation here.
In this chapter, you will learn how to interact with SQL databases from Scala using JDBC, a bare bones Java API. In the next chapter, we will consider Slick, an Object Relational Mapper (ORM) that gives a more Scala-esque feel to interacting with SQL.
This chapter is roughly composed of two sections: we will first discuss the basic functionality for connecting and interacting with SQL databases, and then discuss useful functional patterns that can be used to create an elegant, loosely coupled, and coherent data access layer.
This chapter assumes that you have a basic working knowledge of SQL. If you do not, you would be better off first reading one of the reference books mentioned at the end of the chapter.
- Java多線程編程實戰指南:設計模式篇(第2版)
- C# 7 and .NET Core Cookbook
- Architecting the Industrial Internet
- PyTorch自然語言處理入門與實戰
- Silverlight魔幻銀燈
- 網站構建技術
- Integrating Facebook iOS SDK with Your Application
- 匯編語言編程基礎:基于LoongArch
- C# 7.1 and .NET Core 2.0:Modern Cross-Platform Development(Third Edition)
- C語言程序設計與應用實驗指導書(第2版)
- MATLAB從入門到精通
- Microsoft Azure Security
- The Python Apprentice
- Unity3D高級編程:主程手記
- Clojure High Performance Programming