官术网_书友最值得收藏!

How it works...

We started by loading in the #Anonops text dataset (step 1). The Anonops IRC channel has been affiliated with the Anonymous hacktivist group. In particular, chat participants have in the past planned and announced their future targets on Anonops. Consequently, a well-engineered ML system would be able to predict cyber attacks by training on such data. In step 2, we instantiated a hashing vectorizer. The hashing vectorizer gave us counts of the 1- and 2-grams in the text, in other words, singleton and consecutive pairs of words (tokens) in the articles. We then applied a tf-idf transformer to give appropriate weights to the counts that the hashing vectorizer gave us. Our final result is a large, sparse matrix representing the occurrences of 1- and 2-grams in the texts, weighted by importance. Finally, we examined the frontend of a sparse matrix representation of our featured data in Scipy.

主站蜘蛛池模板: 五原县| 广灵县| 文山县| 湖北省| 姚安县| 察隅县| 尚志市| 南昌县| 兴文县| 尼木县| 奈曼旗| 新巴尔虎右旗| 康马县| 孟村| 大兴区| 交口县| 营口市| 柳河县| 灵璧县| 兰西县| 石屏县| 策勒县| 兰州市| 黔江区| 青州市| 文成县| 清河县| 伊金霍洛旗| 河西区| 茂名市| 丹东市| 固原市| 济阳县| 扶风县| 黑龙江省| 徐州市| 商南县| 桑植县| 静海县| 禹州市| 景泰县|