官术网_书友最值得收藏!

Pachyderm jargon

Think about versioning data in Pachyderm kind of like versioning code in Git. The primitives are similar:

  • Repositories: These are versioned collections of data, similar to having versioned collections of code in Git repositories
  • Commits: Data is versioned in Pachyderm by making commits of that data into data repositories
  • Branches: These lightweight points to certain commits or sets of commits (for example, master points to the latest HEAD commit)
  • Files: Data is versioned at the file level in Pachyderm, and Pachyderm automatically employs strategies, such as de-duplication, to keep your versioned data space efficient
Even though versioning data with Pachyderm feels similar to versioning code with Git, there are some major differences. For example, merging data doesn't exactly make sense. If there are merge conflicts on petabytes of data, no human could resolve these. Furthermore, the Git protocol would not be space efficient in general for large sets of data. Pachyderm uses its own internal logic to perform the versioning and work with versioned data, and the logic is both space efficient and processing efficient in terms of caching.
主站蜘蛛池模板: 怀仁县| 平泉县| 文水县| 马山县| 武城县| 兴义市| 威宁| 怀仁县| 无锡市| 衡南县| 云阳县| 抚顺县| 固镇县| 马山县| 喜德县| 赫章县| 云浮市| 宣城市| 上林县| 商河县| 金寨县| 苏尼特左旗| 双柏县| 丰顺县| 莆田市| 深圳市| 佳木斯市| 中超| 中方县| 维西| 阳山县| 武邑县| 五华县| 安阳县| 台中市| 高唐县| 筠连县| 文化| 松阳县| 那坡县| 芷江|