官术网_书友最值得收藏!

Pachyderm jargon

Think about versioning data in Pachyderm kind of like versioning code in Git. The primitives are similar:

  • Repositories: These are versioned collections of data, similar to having versioned collections of code in Git repositories
  • Commits: Data is versioned in Pachyderm by making commits of that data into data repositories
  • Branches: These lightweight points to certain commits or sets of commits (for example, master points to the latest HEAD commit)
  • Files: Data is versioned at the file level in Pachyderm, and Pachyderm automatically employs strategies, such as de-duplication, to keep your versioned data space efficient
Even though versioning data with Pachyderm feels similar to versioning code with Git, there are some major differences. For example, merging data doesn't exactly make sense. If there are merge conflicts on petabytes of data, no human could resolve these. Furthermore, the Git protocol would not be space efficient in general for large sets of data. Pachyderm uses its own internal logic to perform the versioning and work with versioned data, and the logic is both space efficient and processing efficient in terms of caching.
主站蜘蛛池模板: 行唐县| 濮阳县| 宝坻区| 精河县| 交城县| 和龙市| 长海县| 开远市| 汕头市| 临澧县| 龙口市| 肥乡县| 阜城县| 家居| 广德县| 武隆县| 永新县| 阿拉善盟| 和林格尔县| 昆明市| 迁安市| 乡宁县| 韶山市| 积石山| 广德县| 舒兰市| 阳东县| 和龙市| 南溪县| 理塘县| 江津市| 阜康市| 洪雅县| 宜兴市| 班玛县| 涡阳县| 井陉县| 蒙阴县| 延安市| 铜陵市| 浦江县|