官术网_书友最值得收藏!

How it works...

We started by loading in the #Anonops text dataset (step 1). The Anonops IRC channel has been affiliated with the Anonymous hacktivist group. In particular, chat participants have in the past planned and announced their future targets on Anonops. Consequently, a well-engineered ML system would be able to predict cyber attacks by training on such data. In step 2, we instantiated a hashing vectorizer. The hashing vectorizer gave us counts of the 1- and 2-grams in the text, in other words, singleton and consecutive pairs of words (tokens) in the articles. We then applied a tf-idf transformer to give appropriate weights to the counts that the hashing vectorizer gave us. Our final result is a large, sparse matrix representing the occurrences of 1- and 2-grams in the texts, weighted by importance. Finally, we examined the frontend of a sparse matrix representation of our featured data in Scipy.

主站蜘蛛池模板: 合作市| 许昌县| 台州市| 梁山县| 辉县市| 灵台县| 藁城市| 枣强县| 北辰区| 枣强县| 德惠市| 元谋县| 台南市| 鹿邑县| 永顺县| 西贡区| 桂东县| 古交市| 琼海市| 涞源县| 江陵县| 新宾| 喀什市| 兴化市| 息烽县| 沁源县| 眉山市| 岳普湖县| 永寿县| 枣强县| 红安县| 迁西县| 普宁市| 英吉沙县| 分宜县| 新化县| 宜兰县| 濮阳县| 长白| 浙江省| 许昌市|