- Scala for Data Science
- Pascal Bugnion
- 419字
- 2021-07-23 14:33:09
Chapter 5. Scala and SQL through JDBC
One of data science's raison d'être is the difficulty of manipulating large datasets. Much of the data of interest to a company or research group cannot fit conveniently in a single computer's RAM. Storing the data in a way that is easy to query is therefore a complex problem.
Relational databases have been successful at solving the data storage problem. Originally proposed in 1970 (http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf), the overwhelming majority of databases in active use today are still relational. In that time, the price of RAM per megabyte has decreased by a factor of a hundred million. Similarly, hard drive capacity has increased from tens or hundreds of megabytes to terabytes. It is remarkable that, despite this exponential growth in data storage capacity, the relational model has remained dominant.
Virtually all relational databases are described and queried with variants of SQL (Structured Query Language). With the advent of distributed computing, the position of SQL databases as the de facto data storage standard is being challenged by other types of databases, commonly grouped under the umbrella term NoSQL. Many NoSQL databases are more partition-tolerant than SQL databases: they can be split into several parts residing on different computers. While this author expects that NoSQL databases will become increasingly popular, SQL databases are likely to remain prevalent as a data persistence mechanism; hence, a significant portion of this book is devoted to interacting with SQL from Scala.
While SQL is standardized, most implementations do not follow the full standard. Additionally, most implementations provide extensions to the standard. This means that, while many of the concepts in this book will apply to all SQL backends, the exact syntax will need to be adjusted. We will consider only the MySQL implementation here.
In this chapter, you will learn how to interact with SQL databases from Scala using JDBC, a bare bones Java API. In the next chapter, we will consider Slick, an Object Relational Mapper (ORM) that gives a more Scala-esque feel to interacting with SQL.
This chapter is roughly composed of two sections: we will first discuss the basic functionality for connecting and interacting with SQL databases, and then discuss useful functional patterns that can be used to create an elegant, loosely coupled, and coherent data access layer.
This chapter assumes that you have a basic working knowledge of SQL. If you do not, you would be better off first reading one of the reference books mentioned at the end of the chapter.
- Advanced Quantitative Finance with C++
- Java 開發從入門到精通(第2版)
- Python程序設計(第3版)
- 趣學Python算法100例
- 基于Swift語言的iOS App 商業實戰教程
- 表哥的Access入門:以Excel視角快速學習數據庫開發(第2版)
- C#程序設計(項目教學版)
- 小型編譯器設計實踐
- App Inventor 2 Essentials
- 百萬在線:大型游戲服務端開發
- Implementing Microsoft Dynamics NAV(Third Edition)
- Raspberry Pi Blueprints
- HTML5 Canvas核心技術:圖形、動畫與游戲開發
- Java程序設計
- Natural Language Processing with Python Cookbook