- Machine Learning With Go
- Daniel Whitenack
- 181字
- 2021-07-08 10:37:29
Pachyderm jargon
Think about versioning data in Pachyderm kind of like versioning code in Git. The primitives are similar:
- Repositories: These are versioned collections of data, similar to having versioned collections of code in Git repositories
- Commits: Data is versioned in Pachyderm by making commits of that data into data repositories
- Branches: These lightweight points to certain commits or sets of commits (for example, master points to the latest HEAD commit)
- Files: Data is versioned at the file level in Pachyderm, and Pachyderm automatically employs strategies, such as de-duplication, to keep your versioned data space efficient
Even though versioning data with Pachyderm feels similar to versioning code with Git, there are some major differences. For example, merging data doesn't exactly make sense. If there are merge conflicts on petabytes of data, no human could resolve these. Furthermore, the Git protocol would not be space efficient in general for large sets of data. Pachyderm uses its own internal logic to perform the versioning and work with versioned data, and the logic is both space efficient and processing efficient in terms of caching.
推薦閱讀
- 測(cè)試驅(qū)動(dòng)開(kāi)發(fā):入門、實(shí)戰(zhàn)與進(jìn)階
- Delphi程序設(shè)計(jì)基礎(chǔ):教程、實(shí)驗(yàn)、習(xí)題
- HoloLens Beginner's Guide
- 算法基礎(chǔ):打開(kāi)程序設(shè)計(jì)之門
- 算法大爆炸:面試通關(guān)步步為營(yíng)
- 羅克韋爾ControlLogix系統(tǒng)應(yīng)用技術(shù)
- Python計(jì)算機(jī)視覺(jué)編程
- Hands-On JavaScript High Performance
- Windows Server 2016 Automation with PowerShell Cookbook(Second Edition)
- The DevOps 2.5 Toolkit
- Mastering ArcGIS Enterprise Administration
- MySQL入門很輕松(微課超值版)
- Node.js區(qū)塊鏈開(kāi)發(fā)
- 愛(ài)上C語(yǔ)言:C KISS
- 高性能MVVM框架的設(shè)計(jì)與實(shí)現(xiàn):San