舉報

會員
Pentaho Data Integration Beginner's Guide(Second Edition)
最新章節(jié):
Index
Thisbookfocusesonteachingyoubyexample.ThebookwalksyouthrougheveryaspectofPentahoDataIntegration,givingsystematicinstructionsinafriendlystyle,allowingyoutolearninfrontofyourcomputer,playingwiththetool.TheextensiveuseofdrawingsandscreenshotsmaketheprocessoflearningPentahoDataIntegrationeasy.Throughoutthebook,numeroustipsandhelpfulhintsareprovidedthatyouwillnotfindanywhereelse.Thisbookisamust-haveforsoftwaredevelopers,databaseadministrators,ITstudents,andeveryoneinvolvedorinterestedindevelopingETLsolutions,or,moregenerally,doinganykindofdatamanipulation.ThosewhohaveneverusedPentahoDataIntegrationwillbenefitmostfromthebook,butthosewhohave,theywillalsofindituseful.Thisbookisalsoagoodstartingpointfordatabaseadministrators,datawarehousedesigners,architects,oranyonewhoisresponsiblefordatawarehouseprojectsandneedstoloaddataintothem.
目錄(198章)
倒序
- 封面
- 版權信息
- Credits
- About the Author
- About the Reviewers
- www.PacktPub.com
- Preface
- Chapter 1. Getting Started with Pentaho Data Integration
- Pentaho Data Integration and Pentaho BI Suite
- Exploring the Pentaho Demo
- Installing PDI
- Time for action – installing PDI
- Launching the PDI graphical designer – Spoon
- Time for action – starting and customizing Spoon
- Time for action – creating a hello world transformation
- Installing MySQL
- Time for action – installing MySQL on Windows
- Time for action – installing MySQL on Ubuntu
- Summary
- Chapter 2. Getting Started with Transformations
- Designing and previewing transformations
- Time for action – creating a simple transformation and getting familiar with the design process
- Running transformations in an interactive fashion
- Time for action – generating a range of dates and inspecting the data as it is being created
- Handling errors
- Time for action – avoiding errors while converting the estimated time from string to integer
- Time for action – configuring the error handling to see the description of the errors
- Summary
- Chapter 3. Manipulating Real-world Data
- Reading data from files
- Time for action – reading results of football matches from files
- Time for action – reading all your files at a time using a single text file input step
- Time for action – reading all your files at a time using a single text file input step and regular expressions
- Sending data to files
- Time for action – sending the results of matches to a plain file
- Getting system information
- Time for action – reading and writing matches files with flexibility
- Time for action – running the matches transformation from a terminal window
- XML files
- Time for action – getting data from an XML file with information about countries
- Summary
- Chapter 4. Filtering Searching and Performing Other Useful Operations with Data
- Sorting data
- Time for action – sorting information about matches with the Sort rows step
- Calculations on groups of rows
- Time for action – calculating football match statistics by grouping data
- Filtering
- Time for action – counting frequent words by filtering
- Time for action – refining the counting task by filtering even more
- Looking up data
- Time for action – finding out which language people speak
- Chapter 5. Controlling the Flow of Data
- Splitting streams
- Time for action – browsing new features of PDI by copying a dataset
- Time for action – assigning tasks by distributing
- Splitting the stream based on conditions
- Time for action – assigning tasks by filtering priorities with the Filter rows step
- Time for action – assigning tasks by filtering priorities with the Switch/Case step
- Merging streams
- Time for action – gathering progress and merging it all together
- Time for action – giving priority to Bouchard by using the Append Stream
- Treating invalid data by splitting and merging streams
- Time for action – treating errors in the estimated time to avoid discarding rows
- Summary
- Chapter 6. Transforming Your Data by Coding
- Doing simple tasks with the JavaScript step
- Time for action – counting frequent words by coding in JavaScript
- Reading and parsing unstructured files with JavaScript
- Time for action – changing a list of house descriptions with JavaScript
- Doing simple tasks with the Java Class step
- Time for action – counting frequent words by coding in Java
- Transforming the dataset with Java
- Time for action – splitting the field to rows using Java
- Avoiding coding by using purpose built steps
- Summary
- Chapter 7. Transforming the Rowset
- Converting rows to columns
- Time for action – enhancing the films file by converting rows to columns
- Aggregating data with a Row Denormaliser step
- Time for action – aggregating football matches data with the Row Denormaliser step
- Normalizing data
- Time for action – enhancing the matches file by normalizing the dataset
- Generating a custom time dimension dataset by using Kettle variables
- Time for action – creating the time dimension dataset
- Time for action – parameterizing the start and end date of the time dimension dataset
- Summary
- Chapter 8. Working with Databases
- Introducing the Steel Wheels sample database
- Time for action – creating a connection to the Steel Wheels database
- Time for action – exploring the sample database
- Querying a database
- Time for action – getting data about shipped orders
- Time for action – getting orders in a range of dates using parameters
- Time for action – getting orders in a range of dates by using Kettle variables
- Sending data to a database
- Time for action – loading a table with a list of manufacturers
- Time for action – inserting new products or updating existing ones
- Time for action – testing the update of existing products
- Eliminating data from a database
- Time for action – deleting data about discontinued items
- Summary
- Chapter 9. Performing Advanced Operations with Databases
- Preparing the environment
- Time for action – populating the Jigsaw database
- Looking up data in a database
- Time for action – using a Database lookup step to create a list of products to buy
- Time for action – using a Database join step to create a list of suggested products to buy
- Introducing dimensional modeling
- Loading dimensions with data
- Time for action – loading a region dimension with a Combination lookup/update step
- Time for action – testing the transformation that loads the region dimension
- Time for action – keeping a history of changes in products by using the Dimension lookup/update step
- Time for action – testing the transformation that keeps history of product changes
- Summary
- Chapter 10. Creating Basic Task Flows
- Introducing PDI jobs
- Time for action – creating a folder with a Kettle job
- Designing and running jobs
- Time for action – creating a simple job and getting familiar with the design process
- Running transformations from jobs
- Time for action – generating a range of dates and inspecting how things are running
- Receiving arguments and parameters in a job
- Time for action – generating a hello world file by using arguments and parameters
- Running jobs from a terminal window
- Time for action – executing the hello world job from a terminal window
- Using named parameters and command-line arguments in transformations
- Time for action – calling the hello world transformation with fixed arguments and parameters
- Deciding between the use of a command-line argument and a named parameter
- Summary
- Chapter 11. Creating Advanced Transformations and Jobs
- Re-using part of your transformations
- Time for action – calculating statistics with the use of a subtransformations
- Time for action – generating top average scores by copying and getting rows
- Iterating jobs and transformations
- Time for action – generating custom files by executing a transformation for every input row
- Enhancing your processes with the use of variables
- Time for action – generating custom messages by setting a variable with the name of the examination file
- Summary
- Chapter 12. Developing and Implementing a Simple Datamart
- Exploring the sales datamart
- Loading the dimensions
- Time for action – loading the dimensions for the sales datamart
- Extending the sales datamart model
- Loading a fact table with aggregated data
- Time for action – loading the sales fact table by looking up dimensions
- Getting facts and dimensions together
- Time for action – loading the fact table using a range of dates obtained from the command line
- Time for action – loading the SALES star
- Automating the administrative tasks
- Time for action – automating the loading of the sales datamart
- Summary
- Appendix A. Working with Repositories
- Creating a database repository
- Time for action – creating a PDI repository
- Working with the repository storage system
- Time for action – logging into a database repository
- Examining and modifying the contents of a repository with the Repository Explorer
- Migrating from file-based system to repository-based system and vice versa
- Summary
- Appendix B. Pan and Kitchen – Launching Transformations and Jobs from the Command Line
- Running transformations and jobs stored in files
- Running transformations and jobs from a repository
- Kettle variables and the Kettle home directory
- Checking the exit code
- Providing options when running Pan and Kitchen
- Summary
- Appendix C. Quick Reference – Steps and Job Entries
- Transformation steps
- Job entries
- Summary
- Appendix D. Spoon Shortcuts
- General shortcuts
- Designing transformations and jobs
- Grids
- Repositories
- Database wizards
- Summary
- Appendix E. Introducing PDI 5 Features
- Welcome page
- Usability
- Solutions to commonly occurring situations
- Backend
- Summary
- Appendix F. Best Practices
- Summary
- Appendix G. Pop Quiz Answers
- Chapter 1 Getting Started with Pentaho Data Integration
- Chapter 2 Getting Started with Transformations
- Chapter 3 Manipulating Real-world Data
- Chapter 4 Filtering Searching and Performing Other Useful Operations with Data
- Chapter 5 Controlling the Flow of Data
- Chapter 6 Transforming Your Data by Coding
- Chapter 8 Working with Databases
- Chapter 9 Performing Advanced Operations with Databases
- Chapter 10 Creating Basic Task Flows
- Chapter 11 Creating Advanced Transformations and Jobs
- Chapter 12 Developing and Implementing a Simple Datamart
- Index 更新時間:2021-07-23 15:47:39
推薦閱讀
- Design for the Future
- 大數(shù)據(jù)專業(yè)英語
- Windows XP中文版應用基礎
- MicroPython Projects
- Photoshop CS3特效處理融會貫通
- Multimedia Programming with Pure Data
- Ceph:Designing and Implementing Scalable Storage Systems
- 大數(shù)據(jù)驅動的設備健康預測及維護決策優(yōu)化
- 空間站多臂機器人運動控制研究
- MCGS嵌入版組態(tài)軟件應用教程
- 人工智能:語言智能處理
- 未來學徒:讀懂人工智能飛馳時代
- Puppet 3 Beginner’s Guide
- 單片機C51應用技術
- 信息系統(tǒng)安全保障評估
- ARM嵌入式系統(tǒng)開發(fā)完全入門與主流實踐
- Oracle 11g基礎與提高
- Containerization with Ansible 2
- ASP.NET學習手冊
- Microsoft Office 365:Exchange Online Implementation and Migration(Second Edition)
- 深度學習實戰(zhàn)
- Keras 2.x Projects
- AutoCAD輔助繪圖百練成精
- Alexa Skills Projects
- Learning OpenStack
- Maya電影級動畫角色動作制作
- 計算機控制技術(MCGS實現(xiàn))
- 自動化焦慮癥:科技與職場的未來(《經(jīng)濟學人》選輯)
- Using OpenRefine
- Installation,Storage,and Compute with Windows Server 2016:Microsoft 70-740 MCSA Exam Guide