官术网_书友最值得收藏!

  • Machine Learning With Go
  • Daniel Whitenack
  • 181字
  • 2021-07-08 10:37:29

Pachyderm jargon

Think about versioning data in Pachyderm kind of like versioning code in Git. The primitives are similar:

  • Repositories: These are versioned collections of data, similar to having versioned collections of code in Git repositories
  • Commits: Data is versioned in Pachyderm by making commits of that data into data repositories
  • Branches: These lightweight points to certain commits or sets of commits (for example, master points to the latest HEAD commit)
  • Files: Data is versioned at the file level in Pachyderm, and Pachyderm automatically employs strategies, such as de-duplication, to keep your versioned data space efficient
Even though versioning data with Pachyderm feels similar to versioning code with Git, there are some major differences. For example, merging data doesn't exactly make sense. If there are merge conflicts on petabytes of data, no human could resolve these. Furthermore, the Git protocol would not be space efficient in general for large sets of data. Pachyderm uses its own internal logic to perform the versioning and work with versioned data, and the logic is both space efficient and processing efficient in terms of caching.
主站蜘蛛池模板: 常宁市| 马公市| 临颍县| 乌审旗| 马龙县| 会宁县| 满洲里市| 栾城县| 萨迦县| 永福县| 台东市| 兴文县| 卢氏县| 太原市| 津南区| 鄂托克前旗| 木兰县| 茶陵县| 沅陵县| 涪陵区| 新邵县| 依安县| 柏乡县| 蕉岭县| 太湖县| 伊吾县| 靖宇县| 深水埗区| 贵州省| 台前县| 安图县| 阿合奇县| 临潭县| 通道| 子洲县| 芒康县| 肥东县| 永丰县| 彭阳县| 大足县| 汾西县|