- Natural Language Processing with Java and LingPipe Cookbook
- Breck Baldwin Krishna Dayanidhi
- 450字
- 2021-08-05 17:12:50
How to serialize a LingPipe object – classifier example
In a deployment situation, trained classifiers, other Java objects with complex configuration, or training are best accessed by deserializing them from a disk. The first recipe did exactly this by reading in LMClassifier
from the disk with AbstractExternalizable
. This recipe shows how to get the language ID classifier written out to the disk for later use.
Serializing DynamicLMClassi?er
and reading it back in results in a different class, which is an instance of LMClassifier
that performs the same as the one just trained except that it can no longer accept training instances because counts have been converted to log probabilities and the backoff smoothing arcs are stored in suffix trees. The resulting classifier is much faster.
In general, most of the LingPipe classifiers, language models, and hidden Marcov models (HMM) implement both the Serializable
and Compilable
interfaces.
Getting ready
We will work with the same data as we did in the Viewing error categories – false positives recipe.
How to do it...
Perform the following steps to serialize a LingPipe object:
- Go to the command prompt and convey:
java -cp lingpipe-cookbook.1.0.jar:lib/opencsv-2.4.jar:lib/lingpipe-4.1.0.jar com.lingpipe.cookbook.chapter1.TrainAndWriteClassifierToDisk
- The program will respond with the default file values for input/output:
Training on data/disney_e_n.csv Wrote model to models/my_disney_e_n.LMClassifier
- Test if the model works by invoking the Deserializing and running a classifier recipe while specifying the classifier file to be read in:
java -cp lingpipe-cookbook.1.0.jar:lib/lingpipe-4.1.0.jar com.lingpipe.cookbook.chapter1.LoadClassifierRunOnCommandLine models/my_disney_e_n.LMClassifier
- The usual interaction follows:
Type a string to be classified. Empty string to quit. The rain in Spain Best Category: e
How it works…
The contents of main()
from src/com/lingpipe/cookbook/chapter1/ TrainAndWriteClassifierToDisk.java
start with the materials covered in the previous recipes of the chapter to read the .csv
files, set up a classifier, and train it. Please refer back to it if any code is unclear.
The new bit for this recipe happens when we invoke the AbtractExternalizable.compileTo()
method on DynamicLMClassifier
, which compiles the model and writes it to a file. This method is used like the writeExternal
method from Java's Externalizable
interface:
AbstractExternalizable.compileTo(classifier,outFile);
This is all you need to know folks to write a classifier to a disk.
There's more…
There is an alternate way to serialize that is amenable to more variations of data sources for serializations that are not based on the File
class. An alternate way to write a classifier is:
FileOutputStream fos = new FileOutputStream(outFile); ObjectOutputStream oos = new ObjectOutputStream(fos); classifier.compileTo(oos); oos.close(); fos.close();
Additionally, DynamicLM
can be compiled without involving the disk with a static AbstractExternalizable.compile()
method. It will be used in the following fashion:
@SuppressWarnings("unchecked") LMClassifier<LanguageModel, MultivariateDistribution> compiledLM = (LMClassifier<LanguageModel, MultivariateDistribution>) AbstractExternalizable.compile(classifier);
The compiled version is a lot faster but does not allow further training instances.
- Dynamics 365 for Finance and Operations Development Cookbook(Fourth Edition)
- JavaScript高效圖形編程
- BeagleBone Media Center
- Python網(wǎng)絡(luò)爬蟲從入門到實(shí)踐(第2版)
- Full-Stack Vue.js 2 and Laravel 5
- Python高效開發(fā)實(shí)戰(zhàn):Django、Tornado、Flask、Twisted(第2版)
- Java程序設(shè)計與實(shí)踐教程(第2版)
- TypeScript項目開發(fā)實(shí)戰(zhàn)
- 信息技術(shù)應(yīng)用基礎(chǔ)
- Learning Concurrent Programming in Scala
- C++編程兵書
- UML軟件建模
- Python預(yù)測分析實(shí)戰(zhàn)
- Web編程基礎(chǔ):HTML5、CSS3、JavaScript(第2版)
- Elasticsearch Blueprints