官术网_书友最值得收藏!

Batch Applications for Java Platform 1.0

The Batch Applications API for the Java Platform 1.0 was developed under JSR 352. This section just gives you an overview of the API. The complete document specification (for more information) can be downloaded from http://jcp.org/aboutJava/communityprocess/final/jsr352/index.html.

What is batch processing?

According to the Cambridge Advanced Learner's Dictionary, a batch is a group of things or people dealt with at the same time or considered similar in type. And a process is a series of actions that you take in order to achieve a result. Based on these two definitions, we can say that batch processing is a series of repetitive actions on a large amount of data in order to achieve a result. Given the large amounts of data that it has to deal with, batch processing is often used for end of day, month, period, and year processing.

The following is a short list of domains where you can use batch processing:

  • Data import/export from/to XML or CSV files
  • Accounting processing such as consolidations
  • ETL (extract-transform-load) in a data warehouse
  • Digital files processing (downloading, processing, or saving)
  • Notification of a service's subscribers (such as forum, group, and so on)

Why a dedicated API for batch processing?

After having an idea about batch processing, some people might ask themselves: Why not just set a foreach loop that launches many threads? First of all, you have to know that batch processing is not only concerned with the execution speed. Indeed, the processing of large amounts of data is often affected by many exceptions, which could generate a number of preoccupations: What action should be taken in case of exceptions? Should we cancel the whole process for any exception? If not, what action should be canceled? For which type of exception? If you only need to cancel a certain number of transactions, how do you recognize them? And at the end of a batch processing, it is always important to know how many treatments have been canceled. How many have been registered successfully? How many have been ignored?

As you can see, we have not finished identifying questions that batch processing can raise, but we discover that this is already a great deal. Trying to build such a tool on your own may not only complicate your application but also introduce new bugs.

Understanding the Batch API

The Batch Applications API for the Java Platform 1.0 was developed to provide a solution to the different needs listed in the earlier bullet items. It targets both Java SE and Java EE applications and requires at least the 6th Version of JVM.

The features offered by this API can be summarized as follows:

  • It offers the Reader-Processor-Writer pattern natively and gives you the ability to implement your own batch pattern. This allows you to choose the best pattern depending on the case.
  • It gives the possibility of defining the behavior (skip, retry, rollback, and so on) of the batch processing for each type of error.
  • It supports many step-level metrics such as: rollbackCount, readSkipCount, writeSkipCount, and so on for monitoring.
  • It can be configured to run some processes in parallel and offer the possibility to use JTA or RESOURCE_LOCAL transaction mode.

To do this, the Batch Applications API for the Java Platform 1.0 is based on a solid architecture that can be outlined by the following diagram. A Job is managed by a JobOperator and has one or many steps, which can be either chunk or batchlet. During its lifecycle, information (metadata) about a job is stored in JobRepository, as shown in the following diagram:

JobRepository

As we said earlier, JobRepository stores metadata about current and past running jobs. It can be accessed through JobOperator.

Job

A Job can be seen as an entity to encapsulate a unit of batch processing. It is made up of one or many steps, which must be configured within an XML file called a Job configuration file or Job XML. This file will contain job identification information and different steps that compose the job. The code that follows shows the skeleton of a Job XML file.

<job id="inscription-validator-Job" version="1.0"xmlns="http://xmlns.jcp.org/xml/ns/javaee">  
  
  <step id="step1" >        
    ... 
  </step>    
  <step id="step2" >        
    ...   
  </step>
</job>

The Job XML file is named with the convention <name>.xml (for example, inscriptionJob.xml) and should be stored under the META-INF/batch-jobs directory for portable application.

Step

A Step is an autonomous phase of a batch. It contains all the necessary information to define and control a piece of batch processing. A batch step is either a chunk or a batchlet (the two are mutually exclusive). The step of the following code is a chunk type step:

<job id="inscription-validator-Job" version="1.0"xmlns="http://xmlns.jcp.org/xml/ns/javaee">
  <step id="validate-notify" >        
    <chunk>
       <reader ref="InscriptionReader" />
       <processor ref="InscriptionProcessor" />
       <writer ref="StudentNotifier" />
    </chunk>     
  </step>    
</job>
Chunk

A chunk is a type of step that implements the Reader-Processor-Writer pattern. It runs in the scope of a configurable transaction and can receive many configuration values. The following code is a more enhanced version of the inscription-validator-Job shown in the preceding code. In this listing, we have added a configuration to define the unit element that will be used in order to manage the commit behavior of the chunk (checkpoint-policy="item"), and a configuration to define the number of items (unit elements) to process before commit (item-count="15"). We have also specified the number of exceptions a step will skip if any configured exceptions that can be skipped are thrown by the chunk (skip-limit="30").

The following code is an example of a chunk type step with some configuration:

<job id="inscription-validator-Job" version="1.0" 
  xmlns="http://xmlns.jcp.org/xml/ns/javaee">   
  <step id="validate-notify" >        
    <chunk item-count="15" checkpoint-policy="item" 
      skip-limit="30">
      <reader ref="InscriptionReader" />
      <processor ref="InscriptionProcessor" />
      <writer ref="StudentNotifier" />
    </chunk>     
  </step>    
</job>

The following code shows us what chunk batch artifact implementation looks like. The InscriptionCheckpoint allows you to know the line that is being processed. The source code of this section is a validation program that sends a message to the candidates to let them know if they have been accepted or not. At the end, it displays monitoring information in a web page. The processing is launched by the ChunkStepBatchProcessing.java Servlet.

The following code is a skeleton of chunk batch artifact implementations:

public class InscriptionReader extends AbstractItemReader {
  @Override
  public Object readItem() throws Exception {
    //Read data and return the item
  }
}

public class InscriptionProcessor implements ItemProcessor{
  @Override
  public Object processItem(Object o) throws Exception {
    //Receive item from the reader, process and return the result
  }    
}

public class StudentNotifier extends AbstractItemWriter {
  @Override
  public void writeItems(List<Object> items) throws Exception {
    //Receive items from the processor then write it out
  }
}
public class InscriptionCheckpoint implements Serializable {
  private int lineNumber;
  
  public void incrementLineNumber(){
    lineNumber++;
  }

  public int getLineNumber() {
    return lineNumber;
  }        
}
Batchlet

A batchlet is a type of step to implement your own batch pattern. Unlike a chunk that performs tasks in three phases (reading, processing, and writing), a batchlet step is invoked once and returns an exit status at the end of processing. The following code shows us what a batchlet batch artifact implementation looks like. The source code of this section sends an information message to all students and displays some important information about the batch. The processing is launched by the BatchletStepBatchProcessing.java Servlet.

The following code is a skeleton of batchlet batch artifact implementation:

public class StudentInformation extends AbstractBatchlet{
  
  @Override
  public String process() throws Exception {
    // process 
    return "COMPLETED";
  }    
}

The batch.xml configuration file

The batch.xml file is an XML file that contains the batch artifacts of the batch application. It establishes a correspondence between the batch artifact implementation and the reference name that is used in the Job XML file. The batch.xml file must be stored in the META-INF directory for a portable application. The following code gives us the contents of the batch.xml file for the inscription-validator-Job Job shown in the preceding code.

The following code is an example of batch.xml:

<batch-artifacts xmlns="http://xmlns.jcp.org/xml/ns/javaee"> 
  <ref id="InscriptionReader" 
  class="com.packt.ch02.batchprocessing.chunk.InscriptionReader" /> 
  <ref id="StudentNotifier" 
  class="com.packt.ch02.batchprocessing.chunk.StudentNotifier" /> 
  <ref id="InscriptionProcessor" 
  class="com.packt.ch02.batchprocessing.chunk.InscriptionProcessor" /> 
</batch-artifacts>

JobOperator

The JobOperator instance is accessible through the getJobOperator() method of the BatchRuntime class. It provides a set of operations to manage (start, stop, restart and so on) a job and access JobRepository (getJobNames, getJobInstances, getStepExecutions, and so on). The following code shows how to start the inscription-validator-Job Job shown earlier without any specific property. It is important to note that the inscriptionJob value that is specified in the JobOperator.start command is the name of the Job XML file (not the ID of the job). In the Servlet ChunkStepBatchProcessing, you will see how to retrieve the status and how to monitor information about batch processing from the JobOperator instance.

The following code is an example of code to start a Job:

JobOperator jobOperator = BatchRuntime.getJobOperator();
if(jobOperator != null)
  jobOperator.start("inscriptionJob", null);
主站蜘蛛池模板: 嵊泗县| 泸西县| 遂昌县| 晋城| 陆丰市| 娄烦县| 成武县| 乌什县| 宁都县| 城步| 龙井市| 泽库县| 柳江县| 西城区| 辽阳县| 乐安县| 宝兴县| 龙陵县| 高碑店市| 公安县| 垫江县| 湛江市| 宜宾县| 肥城市| 桦南县| 建昌县| 唐海县| 鲁甸县| 鄂托克前旗| 顺平县| 隆子县| 盘山县| 道孚县| 台安县| 马龙县| 瑞安市| 竹北市| 和平县| 开封市| 肥西县| 申扎县|