官术网_书友最值得收藏!

Chapter 2. Integrity and Inspection

This chapter will cover the following recipes:

  • Trimming excess whitespace
  • Ignoring punctuation and specific characters
  • Coping with unexpected or missing input
  • Validating records by matching regular expressions
  • Lexing and parsing an e-mail address
  • Deduplication of nonconflicting data items
  • Deduplication of conflicting data items
  • Implementing a frequency table using Data.List
  • Implementing a frequency table using Data.MultiSet
  • Computing the Manhattan distance
  • Computing the Euclidean distance
  • Comparing scaled data using the Pearson correlation coefficient
  • Comparing sparse data using cosine similarity
主站蜘蛛池模板: 荆州市| 黄龙县| 霞浦县| 离岛区| 阿坝| 涿州市| 绥化市| 壶关县| 尚义县| 玛曲县| 潮安县| 万安县| 隆林| 共和县| 贡嘎县| 邛崃市| 金湖县| 基隆市| 大竹县| 吉安市| 迁安市| 阳江市| 武夷山市| 曲麻莱县| 长沙县| 唐海县| 界首市| 忻州市| 合川市| 普兰店市| 余庆县| 屏东县| 安溪县| 通城县| 大足县| 攀枝花市| 会泽县| 芒康县| 仁布县| 宁明县| 龙泉市|