官术网_书友最值得收藏!

How to serialize a LingPipe object – classifier example

In a deployment situation, trained classifiers, other Java objects with complex configuration, or training are best accessed by deserializing them from a disk. The first recipe did exactly this by reading in LMClassifier from the disk with AbstractExternalizable. This recipe shows how to get the language ID classifier written out to the disk for later use.

Serializing DynamicLMClassi?er and reading it back in results in a different class, which is an instance of LMClassifier that performs the same as the one just trained except that it can no longer accept training instances because counts have been converted to log probabilities and the backoff smoothing arcs are stored in suffix trees. The resulting classifier is much faster.

In general, most of the LingPipe classifiers, language models, and hidden Marcov models (HMM) implement both the Serializable and Compilable interfaces.

Getting ready

We will work with the same data as we did in the Viewing error categories – false positives recipe.

How to do it...

Perform the following steps to serialize a LingPipe object:

  1. Go to the command prompt and convey:
    java -cp lingpipe-cookbook.1.0.jar:lib/opencsv-2.4.jar:lib/lingpipe-4.1.0.jar com.lingpipe.cookbook.chapter1.TrainAndWriteClassifierToDisk
    
  2. The program will respond with the default file values for input/output:
    Training on data/disney_e_n.csv
    Wrote model to models/my_disney_e_n.LMClassifier
    
  3. Test if the model works by invoking the Deserializing and running a classifier recipe while specifying the classifier file to be read in:
    java -cp lingpipe-cookbook.1.0.jar:lib/lingpipe-4.1.0.jar com.lingpipe.cookbook.chapter1.LoadClassifierRunOnCommandLine models/my_disney_e_n.LMClassifier
    
  4. The usual interaction follows:
    Type a string to be classified. Empty string to quit.
    The rain in Spain
    Best Category: e 
    

How it works…

The contents of main() from src/com/lingpipe/cookbook/chapter1/ TrainAndWriteClassifierToDisk.java start with the materials covered in the previous recipes of the chapter to read the .csv files, set up a classifier, and train it. Please refer back to it if any code is unclear.

The new bit for this recipe happens when we invoke the AbtractExternalizable.compileTo() method on DynamicLMClassifier, which compiles the model and writes it to a file. This method is used like the writeExternal method from Java's Externalizable interface:

AbstractExternalizable.compileTo(classifier,outFile);

This is all you need to know folks to write a classifier to a disk.

There's more…

There is an alternate way to serialize that is amenable to more variations of data sources for serializations that are not based on the File class. An alternate way to write a classifier is:

FileOutputStream fos = new FileOutputStream(outFile);
ObjectOutputStream oos = new ObjectOutputStream(fos);
classifier.compileTo(oos);
oos.close();
fos.close();

Additionally, DynamicLM can be compiled without involving the disk with a static AbstractExternalizable.compile() method. It will be used in the following fashion:

@SuppressWarnings("unchecked")
LMClassifier<LanguageModel, MultivariateDistribution> compiledLM = (LMClassifier<LanguageModel, MultivariateDistribution>) AbstractExternalizable.compile(classifier);

The compiled version is a lot faster but does not allow further training instances.

主站蜘蛛池模板: 南开区| 嘉鱼县| 红河县| 贡嘎县| 漠河县| 剑河县| 大竹县| 丽水市| 贵港市| 富蕴县| 武平县| 红安县| 马山县| 志丹县| 三江| 赞皇县| 涞源县| 上林县| 仙居县| 襄垣县| 皮山县| 昌邑市| 玉山县| 扬州市| 侯马市| 灵台县| 怀集县| 新巴尔虎右旗| 普定县| 宜黄县| 龙州县| 肇州县| 威海市| 正镶白旗| 思南县| 东城区| 临潭县| 沁源县| 巴林左旗| 东兰县| 永靖县|