舉報

會員
Pentaho Data Integration Beginner's Guide(Second Edition)
最新章節:
Index
Thisbookfocusesonteachingyoubyexample.ThebookwalksyouthrougheveryaspectofPentahoDataIntegration,givingsystematicinstructionsinafriendlystyle,allowingyoutolearninfrontofyourcomputer,playingwiththetool.TheextensiveuseofdrawingsandscreenshotsmaketheprocessoflearningPentahoDataIntegrationeasy.Throughoutthebook,numeroustipsandhelpfulhintsareprovidedthatyouwillnotfindanywhereelse.Thisbookisamust-haveforsoftwaredevelopers,databaseadministrators,ITstudents,andeveryoneinvolvedorinterestedindevelopingETLsolutions,or,moregenerally,doinganykindofdatamanipulation.ThosewhohaveneverusedPentahoDataIntegrationwillbenefitmostfromthebook,butthosewhohave,theywillalsofindituseful.Thisbookisalsoagoodstartingpointfordatabaseadministrators,datawarehousedesigners,architects,oranyonewhoisresponsiblefordatawarehouseprojectsandneedstoloaddataintothem.
目錄(198章)
倒序
- 封面
- 版權信息
- Credits
- About the Author
- About the Reviewers
- www.PacktPub.com
- Preface
- Chapter 1. Getting Started with Pentaho Data Integration
- Pentaho Data Integration and Pentaho BI Suite
- Exploring the Pentaho Demo
- Installing PDI
- Time for action – installing PDI
- Launching the PDI graphical designer – Spoon
- Time for action – starting and customizing Spoon
- Time for action – creating a hello world transformation
- Installing MySQL
- Time for action – installing MySQL on Windows
- Time for action – installing MySQL on Ubuntu
- Summary
- Chapter 2. Getting Started with Transformations
- Designing and previewing transformations
- Time for action – creating a simple transformation and getting familiar with the design process
- Running transformations in an interactive fashion
- Time for action – generating a range of dates and inspecting the data as it is being created
- Handling errors
- Time for action – avoiding errors while converting the estimated time from string to integer
- Time for action – configuring the error handling to see the description of the errors
- Summary
- Chapter 3. Manipulating Real-world Data
- Reading data from files
- Time for action – reading results of football matches from files
- Time for action – reading all your files at a time using a single text file input step
- Time for action – reading all your files at a time using a single text file input step and regular expressions
- Sending data to files
- Time for action – sending the results of matches to a plain file
- Getting system information
- Time for action – reading and writing matches files with flexibility
- Time for action – running the matches transformation from a terminal window
- XML files
- Time for action – getting data from an XML file with information about countries
- Summary
- Chapter 4. Filtering Searching and Performing Other Useful Operations with Data
- Sorting data
- Time for action – sorting information about matches with the Sort rows step
- Calculations on groups of rows
- Time for action – calculating football match statistics by grouping data
- Filtering
- Time for action – counting frequent words by filtering
- Time for action – refining the counting task by filtering even more
- Looking up data
- Time for action – finding out which language people speak
- Chapter 5. Controlling the Flow of Data
- Splitting streams
- Time for action – browsing new features of PDI by copying a dataset
- Time for action – assigning tasks by distributing
- Splitting the stream based on conditions
- Time for action – assigning tasks by filtering priorities with the Filter rows step
- Time for action – assigning tasks by filtering priorities with the Switch/Case step
- Merging streams
- Time for action – gathering progress and merging it all together
- Time for action – giving priority to Bouchard by using the Append Stream
- Treating invalid data by splitting and merging streams
- Time for action – treating errors in the estimated time to avoid discarding rows
- Summary
- Chapter 6. Transforming Your Data by Coding
- Doing simple tasks with the JavaScript step
- Time for action – counting frequent words by coding in JavaScript
- Reading and parsing unstructured files with JavaScript
- Time for action – changing a list of house descriptions with JavaScript
- Doing simple tasks with the Java Class step
- Time for action – counting frequent words by coding in Java
- Transforming the dataset with Java
- Time for action – splitting the field to rows using Java
- Avoiding coding by using purpose built steps
- Summary
- Chapter 7. Transforming the Rowset
- Converting rows to columns
- Time for action – enhancing the films file by converting rows to columns
- Aggregating data with a Row Denormaliser step
- Time for action – aggregating football matches data with the Row Denormaliser step
- Normalizing data
- Time for action – enhancing the matches file by normalizing the dataset
- Generating a custom time dimension dataset by using Kettle variables
- Time for action – creating the time dimension dataset
- Time for action – parameterizing the start and end date of the time dimension dataset
- Summary
- Chapter 8. Working with Databases
- Introducing the Steel Wheels sample database
- Time for action – creating a connection to the Steel Wheels database
- Time for action – exploring the sample database
- Querying a database
- Time for action – getting data about shipped orders
- Time for action – getting orders in a range of dates using parameters
- Time for action – getting orders in a range of dates by using Kettle variables
- Sending data to a database
- Time for action – loading a table with a list of manufacturers
- Time for action – inserting new products or updating existing ones
- Time for action – testing the update of existing products
- Eliminating data from a database
- Time for action – deleting data about discontinued items
- Summary
- Chapter 9. Performing Advanced Operations with Databases
- Preparing the environment
- Time for action – populating the Jigsaw database
- Looking up data in a database
- Time for action – using a Database lookup step to create a list of products to buy
- Time for action – using a Database join step to create a list of suggested products to buy
- Introducing dimensional modeling
- Loading dimensions with data
- Time for action – loading a region dimension with a Combination lookup/update step
- Time for action – testing the transformation that loads the region dimension
- Time for action – keeping a history of changes in products by using the Dimension lookup/update step
- Time for action – testing the transformation that keeps history of product changes
- Summary
- Chapter 10. Creating Basic Task Flows
- Introducing PDI jobs
- Time for action – creating a folder with a Kettle job
- Designing and running jobs
- Time for action – creating a simple job and getting familiar with the design process
- Running transformations from jobs
- Time for action – generating a range of dates and inspecting how things are running
- Receiving arguments and parameters in a job
- Time for action – generating a hello world file by using arguments and parameters
- Running jobs from a terminal window
- Time for action – executing the hello world job from a terminal window
- Using named parameters and command-line arguments in transformations
- Time for action – calling the hello world transformation with fixed arguments and parameters
- Deciding between the use of a command-line argument and a named parameter
- Summary
- Chapter 11. Creating Advanced Transformations and Jobs
- Re-using part of your transformations
- Time for action – calculating statistics with the use of a subtransformations
- Time for action – generating top average scores by copying and getting rows
- Iterating jobs and transformations
- Time for action – generating custom files by executing a transformation for every input row
- Enhancing your processes with the use of variables
- Time for action – generating custom messages by setting a variable with the name of the examination file
- Summary
- Chapter 12. Developing and Implementing a Simple Datamart
- Exploring the sales datamart
- Loading the dimensions
- Time for action – loading the dimensions for the sales datamart
- Extending the sales datamart model
- Loading a fact table with aggregated data
- Time for action – loading the sales fact table by looking up dimensions
- Getting facts and dimensions together
- Time for action – loading the fact table using a range of dates obtained from the command line
- Time for action – loading the SALES star
- Automating the administrative tasks
- Time for action – automating the loading of the sales datamart
- Summary
- Appendix A. Working with Repositories
- Creating a database repository
- Time for action – creating a PDI repository
- Working with the repository storage system
- Time for action – logging into a database repository
- Examining and modifying the contents of a repository with the Repository Explorer
- Migrating from file-based system to repository-based system and vice versa
- Summary
- Appendix B. Pan and Kitchen – Launching Transformations and Jobs from the Command Line
- Running transformations and jobs stored in files
- Running transformations and jobs from a repository
- Kettle variables and the Kettle home directory
- Checking the exit code
- Providing options when running Pan and Kitchen
- Summary
- Appendix C. Quick Reference – Steps and Job Entries
- Transformation steps
- Job entries
- Summary
- Appendix D. Spoon Shortcuts
- General shortcuts
- Designing transformations and jobs
- Grids
- Repositories
- Database wizards
- Summary
- Appendix E. Introducing PDI 5 Features
- Welcome page
- Usability
- Solutions to commonly occurring situations
- Backend
- Summary
- Appendix F. Best Practices
- Summary
- Appendix G. Pop Quiz Answers
- Chapter 1 Getting Started with Pentaho Data Integration
- Chapter 2 Getting Started with Transformations
- Chapter 3 Manipulating Real-world Data
- Chapter 4 Filtering Searching and Performing Other Useful Operations with Data
- Chapter 5 Controlling the Flow of Data
- Chapter 6 Transforming Your Data by Coding
- Chapter 8 Working with Databases
- Chapter 9 Performing Advanced Operations with Databases
- Chapter 10 Creating Basic Task Flows
- Chapter 11 Creating Advanced Transformations and Jobs
- Chapter 12 Developing and Implementing a Simple Datamart
- Index 更新時間:2021-07-23 15:47:39
推薦閱讀
- Hands-On Intelligent Agents with OpenAI Gym
- 現代測控系統典型應用實例
- Drupal 7 Multilingual Sites
- 商戰數據挖掘:你需要了解的數據科學與分析思維
- 一本書玩轉數據分析(雙色圖解版)
- HBase Design Patterns
- 快學Flash動畫百例
- Security Automation with Ansible 2
- Creo Parametric 1.0中文版從入門到精通
- 視覺檢測技術及智能計算
- 人工智能與人工生命
- 菜鳥起飛系統安裝與重裝
- 人工智能:語言智能處理
- Learn QGIS
- Xilinx FPGA高級設計及應用
- Learn Microsoft Azure
- 大型機系統應用基礎
- Raspberry Pi 3 Projects for Java Programmers
- Windows Server 2012 Automation with PowerShell Cookbook
- Building Impressive Presentations with Impress.js
- AI成“神”之日:人工智能的終極演變
- 后期合成
- 電氣控制及PLC技術:羅克韋爾Micro800系列
- Flink基礎教程
- Mastering BeagleBone Robotics
- Excel 2007電子表格
- 決戰.NET
- Cisco ACI Cookbook
- Flex 3開發實踐
- Oracle PL/SQL寶典