舉報(bào)

會員
Pentaho Data Integration Beginner's Guide(Second Edition)
最新章節(jié):
Index
Thisbookfocusesonteachingyoubyexample.ThebookwalksyouthrougheveryaspectofPentahoDataIntegration,givingsystematicinstructionsinafriendlystyle,allowingyoutolearninfrontofyourcomputer,playingwiththetool.TheextensiveuseofdrawingsandscreenshotsmaketheprocessoflearningPentahoDataIntegrationeasy.Throughoutthebook,numeroustipsandhelpfulhintsareprovidedthatyouwillnotfindanywhereelse.Thisbookisamust-haveforsoftwaredevelopers,databaseadministrators,ITstudents,andeveryoneinvolvedorinterestedindevelopingETLsolutions,or,moregenerally,doinganykindofdatamanipulation.ThosewhohaveneverusedPentahoDataIntegrationwillbenefitmostfromthebook,butthosewhohave,theywillalsofindituseful.Thisbookisalsoagoodstartingpointfordatabaseadministrators,datawarehousedesigners,architects,oranyonewhoisresponsiblefordatawarehouseprojectsandneedstoloaddataintothem.
最新章節(jié)
- Index
- Chapter 12 Developing and Implementing a Simple Datamart
- Chapter 11 Creating Advanced Transformations and Jobs
- Chapter 10 Creating Basic Task Flows
- Chapter 9 Performing Advanced Operations with Databases
- Chapter 8 Working with Databases
品牌:中圖公司
上架時(shí)間:2021-07-23 15:01:44
出版社:Packt Publishing
本書數(shù)字版權(quán)由中圖公司提供,并由其授權(quán)上海閱文信息技術(shù)有限公司制作發(fā)行
- Index 更新時(shí)間:2021-07-23 15:47:39
- Chapter 12 Developing and Implementing a Simple Datamart
- Chapter 11 Creating Advanced Transformations and Jobs
- Chapter 10 Creating Basic Task Flows
- Chapter 9 Performing Advanced Operations with Databases
- Chapter 8 Working with Databases
- Chapter 6 Transforming Your Data by Coding
- Chapter 5 Controlling the Flow of Data
- Chapter 4 Filtering Searching and Performing Other Useful Operations with Data
- Chapter 3 Manipulating Real-world Data
- Chapter 2 Getting Started with Transformations
- Chapter 1 Getting Started with Pentaho Data Integration
- Appendix G. Pop Quiz Answers
- Summary
- Appendix F. Best Practices
- Summary
- Backend
- Solutions to commonly occurring situations
- Usability
- Welcome page
- Appendix E. Introducing PDI 5 Features
- Summary
- Database wizards
- Repositories
- Grids
- Designing transformations and jobs
- General shortcuts
- Appendix D. Spoon Shortcuts
- Summary
- Job entries
- Transformation steps
- Appendix C. Quick Reference – Steps and Job Entries
- Summary
- Providing options when running Pan and Kitchen
- Checking the exit code
- Kettle variables and the Kettle home directory
- Running transformations and jobs from a repository
- Running transformations and jobs stored in files
- Appendix B. Pan and Kitchen – Launching Transformations and Jobs from the Command Line
- Summary
- Migrating from file-based system to repository-based system and vice versa
- Examining and modifying the contents of a repository with the Repository Explorer
- Time for action – logging into a database repository
- Working with the repository storage system
- Time for action – creating a PDI repository
- Creating a database repository
- Appendix A. Working with Repositories
- Summary
- Time for action – automating the loading of the sales datamart
- Automating the administrative tasks
- Time for action – loading the SALES star
- Time for action – loading the fact table using a range of dates obtained from the command line
- Getting facts and dimensions together
- Time for action – loading the sales fact table by looking up dimensions
- Loading a fact table with aggregated data
- Extending the sales datamart model
- Time for action – loading the dimensions for the sales datamart
- Loading the dimensions
- Exploring the sales datamart
- Chapter 12. Developing and Implementing a Simple Datamart
- Summary
- Time for action – generating custom messages by setting a variable with the name of the examination file
- Enhancing your processes with the use of variables
- Time for action – generating custom files by executing a transformation for every input row
- Iterating jobs and transformations
- Time for action – generating top average scores by copying and getting rows
- Time for action – calculating statistics with the use of a subtransformations
- Re-using part of your transformations
- Chapter 11. Creating Advanced Transformations and Jobs
- Summary
- Deciding between the use of a command-line argument and a named parameter
- Time for action – calling the hello world transformation with fixed arguments and parameters
- Using named parameters and command-line arguments in transformations
- Time for action – executing the hello world job from a terminal window
- Running jobs from a terminal window
- Time for action – generating a hello world file by using arguments and parameters
- Receiving arguments and parameters in a job
- Time for action – generating a range of dates and inspecting how things are running
- Running transformations from jobs
- Time for action – creating a simple job and getting familiar with the design process
- Designing and running jobs
- Time for action – creating a folder with a Kettle job
- Introducing PDI jobs
- Chapter 10. Creating Basic Task Flows
- Summary
- Time for action – testing the transformation that keeps history of product changes
- Time for action – keeping a history of changes in products by using the Dimension lookup/update step
- Time for action – testing the transformation that loads the region dimension
- Time for action – loading a region dimension with a Combination lookup/update step
- Loading dimensions with data
- Introducing dimensional modeling
- Time for action – using a Database join step to create a list of suggested products to buy
- Time for action – using a Database lookup step to create a list of products to buy
- Looking up data in a database
- Time for action – populating the Jigsaw database
- Preparing the environment
- Chapter 9. Performing Advanced Operations with Databases
- Summary
- Time for action – deleting data about discontinued items
- Eliminating data from a database
- Time for action – testing the update of existing products
- Time for action – inserting new products or updating existing ones
- Time for action – loading a table with a list of manufacturers
- Sending data to a database
- Time for action – getting orders in a range of dates by using Kettle variables
- Time for action – getting orders in a range of dates using parameters
- Time for action – getting data about shipped orders
- Querying a database
- Time for action – exploring the sample database
- Time for action – creating a connection to the Steel Wheels database
- Introducing the Steel Wheels sample database
- Chapter 8. Working with Databases
- Summary
- Time for action – parameterizing the start and end date of the time dimension dataset
- Time for action – creating the time dimension dataset
- Generating a custom time dimension dataset by using Kettle variables
- Time for action – enhancing the matches file by normalizing the dataset
- Normalizing data
- Time for action – aggregating football matches data with the Row Denormaliser step
- Aggregating data with a Row Denormaliser step
- Time for action – enhancing the films file by converting rows to columns
- Converting rows to columns
- Chapter 7. Transforming the Rowset
- Summary
- Avoiding coding by using purpose built steps
- Time for action – splitting the field to rows using Java
- Transforming the dataset with Java
- Time for action – counting frequent words by coding in Java
- Doing simple tasks with the Java Class step
- Time for action – changing a list of house descriptions with JavaScript
- Reading and parsing unstructured files with JavaScript
- Time for action – counting frequent words by coding in JavaScript
- Doing simple tasks with the JavaScript step
- Chapter 6. Transforming Your Data by Coding
- Summary
- Time for action – treating errors in the estimated time to avoid discarding rows
- Treating invalid data by splitting and merging streams
- Time for action – giving priority to Bouchard by using the Append Stream
- Time for action – gathering progress and merging it all together
- Merging streams
- Time for action – assigning tasks by filtering priorities with the Switch/Case step
- Time for action – assigning tasks by filtering priorities with the Filter rows step
- Splitting the stream based on conditions
- Time for action – assigning tasks by distributing
- Time for action – browsing new features of PDI by copying a dataset
- Splitting streams
- Chapter 5. Controlling the Flow of Data
- Time for action – finding out which language people speak
- Looking up data
- Time for action – refining the counting task by filtering even more
- Time for action – counting frequent words by filtering
- Filtering
- Time for action – calculating football match statistics by grouping data
- Calculations on groups of rows
- Time for action – sorting information about matches with the Sort rows step
- Sorting data
- Chapter 4. Filtering Searching and Performing Other Useful Operations with Data
- Summary
- Time for action – getting data from an XML file with information about countries
- XML files
- Time for action – running the matches transformation from a terminal window
- Time for action – reading and writing matches files with flexibility
- Getting system information
- Time for action – sending the results of matches to a plain file
- Sending data to files
- Time for action – reading all your files at a time using a single text file input step and regular expressions
- Time for action – reading all your files at a time using a single text file input step
- Time for action – reading results of football matches from files
- Reading data from files
- Chapter 3. Manipulating Real-world Data
- Summary
- Time for action – configuring the error handling to see the description of the errors
- Time for action – avoiding errors while converting the estimated time from string to integer
- Handling errors
- Time for action – generating a range of dates and inspecting the data as it is being created
- Running transformations in an interactive fashion
- Time for action – creating a simple transformation and getting familiar with the design process
- Designing and previewing transformations
- Chapter 2. Getting Started with Transformations
- Summary
- Time for action – installing MySQL on Ubuntu
- Time for action – installing MySQL on Windows
- Installing MySQL
- Time for action – creating a hello world transformation
- Time for action – starting and customizing Spoon
- Launching the PDI graphical designer – Spoon
- Time for action – installing PDI
- Installing PDI
- Exploring the Pentaho Demo
- Pentaho Data Integration and Pentaho BI Suite
- Chapter 1. Getting Started with Pentaho Data Integration
- Preface
- www.PacktPub.com
- About the Reviewers
- About the Author
- Credits
- 版權(quán)信息
- 封面
- 封面
- 版權(quán)信息
- Credits
- About the Author
- About the Reviewers
- www.PacktPub.com
- Preface
- Chapter 1. Getting Started with Pentaho Data Integration
- Pentaho Data Integration and Pentaho BI Suite
- Exploring the Pentaho Demo
- Installing PDI
- Time for action – installing PDI
- Launching the PDI graphical designer – Spoon
- Time for action – starting and customizing Spoon
- Time for action – creating a hello world transformation
- Installing MySQL
- Time for action – installing MySQL on Windows
- Time for action – installing MySQL on Ubuntu
- Summary
- Chapter 2. Getting Started with Transformations
- Designing and previewing transformations
- Time for action – creating a simple transformation and getting familiar with the design process
- Running transformations in an interactive fashion
- Time for action – generating a range of dates and inspecting the data as it is being created
- Handling errors
- Time for action – avoiding errors while converting the estimated time from string to integer
- Time for action – configuring the error handling to see the description of the errors
- Summary
- Chapter 3. Manipulating Real-world Data
- Reading data from files
- Time for action – reading results of football matches from files
- Time for action – reading all your files at a time using a single text file input step
- Time for action – reading all your files at a time using a single text file input step and regular expressions
- Sending data to files
- Time for action – sending the results of matches to a plain file
- Getting system information
- Time for action – reading and writing matches files with flexibility
- Time for action – running the matches transformation from a terminal window
- XML files
- Time for action – getting data from an XML file with information about countries
- Summary
- Chapter 4. Filtering Searching and Performing Other Useful Operations with Data
- Sorting data
- Time for action – sorting information about matches with the Sort rows step
- Calculations on groups of rows
- Time for action – calculating football match statistics by grouping data
- Filtering
- Time for action – counting frequent words by filtering
- Time for action – refining the counting task by filtering even more
- Looking up data
- Time for action – finding out which language people speak
- Chapter 5. Controlling the Flow of Data
- Splitting streams
- Time for action – browsing new features of PDI by copying a dataset
- Time for action – assigning tasks by distributing
- Splitting the stream based on conditions
- Time for action – assigning tasks by filtering priorities with the Filter rows step
- Time for action – assigning tasks by filtering priorities with the Switch/Case step
- Merging streams
- Time for action – gathering progress and merging it all together
- Time for action – giving priority to Bouchard by using the Append Stream
- Treating invalid data by splitting and merging streams
- Time for action – treating errors in the estimated time to avoid discarding rows
- Summary
- Chapter 6. Transforming Your Data by Coding
- Doing simple tasks with the JavaScript step
- Time for action – counting frequent words by coding in JavaScript
- Reading and parsing unstructured files with JavaScript
- Time for action – changing a list of house descriptions with JavaScript
- Doing simple tasks with the Java Class step
- Time for action – counting frequent words by coding in Java
- Transforming the dataset with Java
- Time for action – splitting the field to rows using Java
- Avoiding coding by using purpose built steps
- Summary
- Chapter 7. Transforming the Rowset
- Converting rows to columns
- Time for action – enhancing the films file by converting rows to columns
- Aggregating data with a Row Denormaliser step
- Time for action – aggregating football matches data with the Row Denormaliser step
- Normalizing data
- Time for action – enhancing the matches file by normalizing the dataset
- Generating a custom time dimension dataset by using Kettle variables
- Time for action – creating the time dimension dataset
- Time for action – parameterizing the start and end date of the time dimension dataset
- Summary
- Chapter 8. Working with Databases
- Introducing the Steel Wheels sample database
- Time for action – creating a connection to the Steel Wheels database
- Time for action – exploring the sample database
- Querying a database
- Time for action – getting data about shipped orders
- Time for action – getting orders in a range of dates using parameters
- Time for action – getting orders in a range of dates by using Kettle variables
- Sending data to a database
- Time for action – loading a table with a list of manufacturers
- Time for action – inserting new products or updating existing ones
- Time for action – testing the update of existing products
- Eliminating data from a database
- Time for action – deleting data about discontinued items
- Summary
- Chapter 9. Performing Advanced Operations with Databases
- Preparing the environment
- Time for action – populating the Jigsaw database
- Looking up data in a database
- Time for action – using a Database lookup step to create a list of products to buy
- Time for action – using a Database join step to create a list of suggested products to buy
- Introducing dimensional modeling
- Loading dimensions with data
- Time for action – loading a region dimension with a Combination lookup/update step
- Time for action – testing the transformation that loads the region dimension
- Time for action – keeping a history of changes in products by using the Dimension lookup/update step
- Time for action – testing the transformation that keeps history of product changes
- Summary
- Chapter 10. Creating Basic Task Flows
- Introducing PDI jobs
- Time for action – creating a folder with a Kettle job
- Designing and running jobs
- Time for action – creating a simple job and getting familiar with the design process
- Running transformations from jobs
- Time for action – generating a range of dates and inspecting how things are running
- Receiving arguments and parameters in a job
- Time for action – generating a hello world file by using arguments and parameters
- Running jobs from a terminal window
- Time for action – executing the hello world job from a terminal window
- Using named parameters and command-line arguments in transformations
- Time for action – calling the hello world transformation with fixed arguments and parameters
- Deciding between the use of a command-line argument and a named parameter
- Summary
- Chapter 11. Creating Advanced Transformations and Jobs
- Re-using part of your transformations
- Time for action – calculating statistics with the use of a subtransformations
- Time for action – generating top average scores by copying and getting rows
- Iterating jobs and transformations
- Time for action – generating custom files by executing a transformation for every input row
- Enhancing your processes with the use of variables
- Time for action – generating custom messages by setting a variable with the name of the examination file
- Summary
- Chapter 12. Developing and Implementing a Simple Datamart
- Exploring the sales datamart
- Loading the dimensions
- Time for action – loading the dimensions for the sales datamart
- Extending the sales datamart model
- Loading a fact table with aggregated data
- Time for action – loading the sales fact table by looking up dimensions
- Getting facts and dimensions together
- Time for action – loading the fact table using a range of dates obtained from the command line
- Time for action – loading the SALES star
- Automating the administrative tasks
- Time for action – automating the loading of the sales datamart
- Summary
- Appendix A. Working with Repositories
- Creating a database repository
- Time for action – creating a PDI repository
- Working with the repository storage system
- Time for action – logging into a database repository
- Examining and modifying the contents of a repository with the Repository Explorer
- Migrating from file-based system to repository-based system and vice versa
- Summary
- Appendix B. Pan and Kitchen – Launching Transformations and Jobs from the Command Line
- Running transformations and jobs stored in files
- Running transformations and jobs from a repository
- Kettle variables and the Kettle home directory
- Checking the exit code
- Providing options when running Pan and Kitchen
- Summary
- Appendix C. Quick Reference – Steps and Job Entries
- Transformation steps
- Job entries
- Summary
- Appendix D. Spoon Shortcuts
- General shortcuts
- Designing transformations and jobs
- Grids
- Repositories
- Database wizards
- Summary
- Appendix E. Introducing PDI 5 Features
- Welcome page
- Usability
- Solutions to commonly occurring situations
- Backend
- Summary
- Appendix F. Best Practices
- Summary
- Appendix G. Pop Quiz Answers
- Chapter 1 Getting Started with Pentaho Data Integration
- Chapter 2 Getting Started with Transformations
- Chapter 3 Manipulating Real-world Data
- Chapter 4 Filtering Searching and Performing Other Useful Operations with Data
- Chapter 5 Controlling the Flow of Data
- Chapter 6 Transforming Your Data by Coding
- Chapter 8 Working with Databases
- Chapter 9 Performing Advanced Operations with Databases
- Chapter 10 Creating Basic Task Flows
- Chapter 11 Creating Advanced Transformations and Jobs
- Chapter 12 Developing and Implementing a Simple Datamart
- Index 更新時(shí)間:2021-07-23 15:47:39