官术网_书友最值得收藏!

The BaseML class

For the BaseML class, we have made several enhancements, starting with the constructor. In the constructor, we initialize the stringRex variable to the regular expression we will use to extract strings. Encoding.RegisterProvider is critical to utilize the Windows-1252 encoding. This encoding is the encoding Windows Executables utilize:

private static Regex _stringRex;

protected BaseML()
{
MlContext = new MLContext(2020);

Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);

_stringRex = new Regex(@"[ -~\t]{8,}", RegexOptions.Compiled);
}

The next major addition is the GetStrings method. This method takes the bytes, runs the previously created compiled regular expression, and extracts the string matches:

  1. To begin, we define the method definition and initialize the stringLines variable to hold the strings:
protected string GetStrings(byte[] data)
{
var stringLines = new StringBuilder();
  1. Next, we will sanity check the input data is not null or empty:
if (data == null || data.Length == 0)
{
return stringLines.ToString();
}
  1. The next block of code we open a MemoryStream object and then a StreamReader object:
 using (var ms = new MemoryStream(data, false))
{
using (var streamReader = new StreamReader(ms, Encoding.GetEncoding(1252), false, 2048, false))
{
  1. We will then loop through the streamReader object until an EndOfStream condition is reached, reading line by line:
while (!streamReader.EndOfStream)
{
var line = streamReader.ReadLine();
  1. We then will apply some string clean up of the data and handle whether the line is empty or not gracefully:
if (string.IsNullOrEmpty(line))
{
continue;
}

line = line.Replace("^", "").Replace(")", "").Replace("-", "");
  1. Then, we will append the regular expression matches and append those matches to the previously defined stringLines variable:
stringLines.Append(string.Join(string.Empty,
_stringRex.Matches(line).Where(a => !string.IsNullOrEmpty(a.Value) && !string.IsNullOrWhiteSpace(a.Value)).ToList()));
  1. Lastly, we will return the stringLines variable converted into a single string using the string.Join method:
    return string.Join(string.Empty, stringLines);
}
主站蜘蛛池模板: 浪卡子县| 洛阳市| 密云县| 西昌市| 海原县| 江川县| 常宁市| 沂南县| 南召县| 马公市| 威远县| 桦南县| 新晃| 陆川县| 虎林市| 塔城市| 南康市| 镇宁| 安丘市| 扶风县| 民权县| 元朗区| 郓城县| 九寨沟县| 辰溪县| 阜城县| 剑阁县| 张掖市| 五峰| 闻喜县| 沛县| 西乌珠穆沁旗| 读书| 巨鹿县| 武清区| 大港区| 遵义县| 武平县| 乐亭县| 汶上县| 乌拉特前旗|