- Hands-On Machine Learning with ML.NET
- Jarred Capellman
- 283字
- 2021-06-24 16:43:34
The BaseML class
For the BaseML class, we have made several enhancements, starting with the constructor. In the constructor, we initialize the stringRex variable to the regular expression we will use to extract strings. Encoding.RegisterProvider is critical to utilize the Windows-1252 encoding. This encoding is the encoding Windows Executables utilize:
private static Regex _stringRex;
protected BaseML()
{
MlContext = new MLContext(2020);
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
_stringRex = new Regex(@"[ -~\t]{8,}", RegexOptions.Compiled);
}
The next major addition is the GetStrings method. This method takes the bytes, runs the previously created compiled regular expression, and extracts the string matches:
- To begin, we define the method definition and initialize the stringLines variable to hold the strings:
protected string GetStrings(byte[] data)
{
var stringLines = new StringBuilder();
- Next, we will sanity check the input data is not null or empty:
if (data == null || data.Length == 0)
{
return stringLines.ToString();
}
- The next block of code we open a MemoryStream object and then a StreamReader object:
using (var ms = new MemoryStream(data, false))
{
using (var streamReader = new StreamReader(ms, Encoding.GetEncoding(1252), false, 2048, false))
{
- We will then loop through the streamReader object until an EndOfStream condition is reached, reading line by line:
while (!streamReader.EndOfStream)
{
var line = streamReader.ReadLine();
- We then will apply some string clean up of the data and handle whether the line is empty or not gracefully:
if (string.IsNullOrEmpty(line))
{
continue;
}
line = line.Replace("^", "").Replace(")", "").Replace("-", "");
- Then, we will append the regular expression matches and append those matches to the previously defined stringLines variable:
stringLines.Append(string.Join(string.Empty,
_stringRex.Matches(line).Where(a => !string.IsNullOrEmpty(a.Value) && !string.IsNullOrWhiteSpace(a.Value)).ToList()));
- Lastly, we will return the stringLines variable converted into a single string using the string.Join method:
return string.Join(string.Empty, stringLines);
}
推薦閱讀
- Oracle 11g從入門到精通(第2版) (軟件開發視頻大講堂)
- Learning Elixir
- 精通Scrapy網絡爬蟲
- 用戶體驗增長:數字化·智能化·綠色化
- Android開發案例教程與項目實戰(在線實驗+在線自測)
- SQL Server數據庫管理與開發兵書
- 編程菜鳥學Python數據分析
- App Inventor少兒趣味編程動手做
- Photoshop智能手機APP界面設計
- Mastering OAuth 2.0
- Arduino Electronics Blueprints
- Java Web動態網站開發(第2版·微課版)
- Android開發進階實戰:拓展與提升
- MySQL核心技術與最佳實踐
- 基于JavaScript的WebGIS開發