官术网_书友最值得收藏!

Chapter 5. Scala and SQL through JDBC

One of data science's raison d'être is the difficulty of manipulating large datasets. Much of the data of interest to a company or research group cannot fit conveniently in a single computer's RAM. Storing the data in a way that is easy to query is therefore a complex problem.

Relational databases have been successful at solving the data storage problem. Originally proposed in 1970 (http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf), the overwhelming majority of databases in active use today are still relational. In that time, the price of RAM per megabyte has decreased by a factor of a hundred million. Similarly, hard drive capacity has increased from tens or hundreds of megabytes to terabytes. It is remarkable that, despite this exponential growth in data storage capacity, the relational model has remained dominant.

Virtually all relational databases are described and queried with variants of SQL (Structured Query Language). With the advent of distributed computing, the position of SQL databases as the de facto data storage standard is being challenged by other types of databases, commonly grouped under the umbrella term NoSQL. Many NoSQL databases are more partition-tolerant than SQL databases: they can be split into several parts residing on different computers. While this author expects that NoSQL databases will become increasingly popular, SQL databases are likely to remain prevalent as a data persistence mechanism; hence, a significant portion of this book is devoted to interacting with SQL from Scala.

While SQL is standardized, most implementations do not follow the full standard. Additionally, most implementations provide extensions to the standard. This means that, while many of the concepts in this book will apply to all SQL backends, the exact syntax will need to be adjusted. We will consider only the MySQL implementation here.

In this chapter, you will learn how to interact with SQL databases from Scala using JDBC, a bare bones Java API. In the next chapter, we will consider Slick, an Object Relational Mapper (ORM) that gives a more Scala-esque feel to interacting with SQL.

This chapter is roughly composed of two sections: we will first discuss the basic functionality for connecting and interacting with SQL databases, and then discuss useful functional patterns that can be used to create an elegant, loosely coupled, and coherent data access layer.

This chapter assumes that you have a basic working knowledge of SQL. If you do not, you would be better off first reading one of the reference books mentioned at the end of the chapter.

主站蜘蛛池模板: 纳雍县| 浠水县| 定襄县| 南川市| 孟津县| 三都| 南雄市| 容城县| 凤庆县| 诸暨市| 汶川县| 宜州市| 阜南县| 嘉义县| 桃园县| 寻甸| 浦江县| 芦溪县| 铜川市| 临夏市| 海盐县| 苗栗县| 同德县| 建平县| 富锦市| 余干县| 大田县| 阿拉善盟| 贺州市| 宁强县| 桂东县| 光泽县| 利川市| 琼海市| 资阳市| 堆龙德庆县| 屯昌县| 大同县| 紫金县| 乌审旗| 玉溪市|