官术网_书友最值得收藏!

  • Scala for Data Science
  • Pascal Bugnion
  • 419字
  • 2021-07-23 14:33:09

Chapter 5. Scala and SQL through JDBC

One of data science's raison d'être is the difficulty of manipulating large datasets. Much of the data of interest to a company or research group cannot fit conveniently in a single computer's RAM. Storing the data in a way that is easy to query is therefore a complex problem.

Relational databases have been successful at solving the data storage problem. Originally proposed in 1970 (http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf), the overwhelming majority of databases in active use today are still relational. In that time, the price of RAM per megabyte has decreased by a factor of a hundred million. Similarly, hard drive capacity has increased from tens or hundreds of megabytes to terabytes. It is remarkable that, despite this exponential growth in data storage capacity, the relational model has remained dominant.

Virtually all relational databases are described and queried with variants of SQL (Structured Query Language). With the advent of distributed computing, the position of SQL databases as the de facto data storage standard is being challenged by other types of databases, commonly grouped under the umbrella term NoSQL. Many NoSQL databases are more partition-tolerant than SQL databases: they can be split into several parts residing on different computers. While this author expects that NoSQL databases will become increasingly popular, SQL databases are likely to remain prevalent as a data persistence mechanism; hence, a significant portion of this book is devoted to interacting with SQL from Scala.

While SQL is standardized, most implementations do not follow the full standard. Additionally, most implementations provide extensions to the standard. This means that, while many of the concepts in this book will apply to all SQL backends, the exact syntax will need to be adjusted. We will consider only the MySQL implementation here.

In this chapter, you will learn how to interact with SQL databases from Scala using JDBC, a bare bones Java API. In the next chapter, we will consider Slick, an Object Relational Mapper (ORM) that gives a more Scala-esque feel to interacting with SQL.

This chapter is roughly composed of two sections: we will first discuss the basic functionality for connecting and interacting with SQL databases, and then discuss useful functional patterns that can be used to create an elegant, loosely coupled, and coherent data access layer.

This chapter assumes that you have a basic working knowledge of SQL. If you do not, you would be better off first reading one of the reference books mentioned at the end of the chapter.

主站蜘蛛池模板: 探索| 小金县| 精河县| 阿拉善盟| 张家界市| 菏泽市| 会泽县| 辛集市| 布尔津县| 芜湖县| 苏州市| 沿河| 罗定市| 铜鼓县| 财经| 仪陇县| 宁都县| 马山县| 香港| 黄陵县| 剑河县| 蒙山县| 姚安县| 乌拉特中旗| 潼关县| 铜山县| 道孚县| 湾仔区| 乐安县| 长顺县| 台安县| 岳阳县| 平阳县| 泸水县| 丘北县| 安达市| 工布江达县| 诸暨市| 田东县| 南溪县| 利津县|