官术网_书友最值得收藏!

Interacting with data in binary format

We can read and write binary serialization of Python objects with the pickle module, which can be found in the standard library. Object serialization can be useful, if you work with objects that take a long time to create, like some machine learning models. By pickling such objects, subsequent access to this model can be made faster. It also allows you to distribute Python objects in a standardized way.

Pandas includes support for pickling out of the box. The relevant methods are the read_pickle() and to_pickle() functions to read and write data from and to files easily. Those methods will write data to disk in the pickle format, which is a convenient short-term storage format:

>>> df_ex3.to_pickle('example_data/ex_06-03.out')
>>> pd.read_pickle('example_data/ex_06-03.out')
 1 2 3 4
0
Nam 7 1 male hcm
Mai 11 1 female hcm
Lan 25 3 female hn
Hung 42 3 male tn
Nghia 26 3 male dn
Vinh 39 3 male vl
Hong 28 4 female dn

HDF5

HDF5 is not a database, but a data model and file format. It is suited for write-one, read-many datasets. An HDF5 file includes two kinds of objects: data sets, which are array-like collections of data, and groups, which are folder-like containers what hold data sets and other groups. There are some interfaces for interacting with HDF5 format in Python, such as h5py which uses familiar NumPy and Python constructs, such as dictionaries and NumPy array syntax. With h5py, we have high-level interface to the HDF5 API which helps us to get started. However, in this book, we will introduce another library for this kind of format called PyTables, which works well with Pandas objects:

>>> store = pd.HDFStore('hdf5_store.h5')
>>> store
<class 'pandas.io.pytables.HDFStore'>
File path: hdf5_store.h5
Empty

We created an empty HDF5 file, named hdf5_store.h5. Now, we can write data to the file just like adding key-value pairs to a dict:

>>> store['ex3'] = df_ex3
>>> store['name'] = df_ex2[0]
>>> store['hometown'] = df_ex3[4]
>>> store
<class 'pandas.io.pytables.HDFStore'>
File path: hdf5_store.h5
/ex3 frame (shape->[7,4])
/hometown series (shape->[1])
/name series (shape->[1])

Objects stored in the HDF5 file can be retrieved by specifying the object keys:

>>> store['name']
0 Nam
1 Mai
2 Lan
3 Hung
4 Nghia
5 Vinh
6 Hong
Name: 0, dtype: object

Once we have finished interacting with the HDF5 file, we close it to release the file handle:

>>> store.close()
>>> store
<class 'pandas.io.pytables.HDFStore'>
File path: hdf5_store.h5
File is CLOSED

There are other supported functions that are useful for working with the HDF5 format. You should explore ,in more detail, two libraries – pytables and h5py – if you need to work with huge quantities of data.

主站蜘蛛池模板: 米易县| 榆树市| 汨罗市| 休宁县| 保康县| 宜都市| 建宁县| 阳原县| 富源县| 嵊泗县| 福贡县| 德庆县| 台安县| 铜川市| 来安县| 铁岭县| 措勤县| 新巴尔虎右旗| 时尚| 南丰县| 全州县| 永仁县| 阳城县| 神池县| 铜鼓县| 梁山县| 佛山市| 定边县| 石阡县| 韶关市| 龙南县| 政和县| 土默特左旗| 海宁市| 霍州市| 甘肃省| 贵定县| 夏邑县| 阜宁县| 玛多县| 神农架林区|