官术网_书友最值得收藏!

Tweaking the parameters

So what about all the other parameters? Can we tweak them all to get better results?

Sure. We could, of course, tweak the number of clusters or play with the vectorizer's max_features parameter (you should try that!). Also, we could play with different cluster center initializations. There are also more exciting alternatives to KMeans itself. There are, for example, clustering approaches that also let you use different similarity measurements such as Cosine similarity, Pearson, or Jaccard. An exciting field for you to play.

But before you go there, you will have to define what you actually mean by "better". Scikit has a complete package dedicated only to this definition. The package is called sklearn.metrics and also contains a full range of different metrics to measure clustering quality. Maybe that should be the first place to go now, right into the sources of the metrics package.

主站蜘蛛池模板: 台前县| 稻城县| 同心县| 靖边县| 贵港市| 龙海市| 射洪县| 绥棱县| 独山县| 嵊泗县| 韩城市| 渝中区| 焉耆| 游戏| 文登市| 柳江县| 康定县| 沧州市| 郯城县| 茶陵县| 南丰县| 全州县| 湖北省| 隆回县| 三亚市| 横山县| 波密县| 泾阳县| 噶尔县| 乌鲁木齐县| 外汇| 寻甸| 吴江市| 邢台县| 芜湖市| 临西县| 石泉县| 南靖县| 灵武市| 象州县| 敦化市|