官术网_书友最值得收藏!

Deep diving into a concrete example

Early on, we wanted to build a data pipeline that extracted insights from Twitter by doing sentiment analysis of tweets containing specific hashtags and to deploy the results to a real-time dashboard. This application was a perfect starting point for us, because the data science analytics were not too complex, and the application covered many aspects of a real-life scenario:

  • High volume, high throughput streaming data
  • Data enrichment with sentiment analysis NLP
  • Basic data aggregation
  • Data visualization
  • Deployment into a real-time dashboard

To try things out, the first implementation was a simple Python application that used the tweepy library (the official Twitter library for Python: https://pypi.python.org/pypi/tweepy) to connect to Twitter and get a stream of tweets and textblob (the simple Python library for basic NLP: https://pypi.python.org/pypi/textblob) for sentiment analysis enrichment.

The results were then saved into a JSON file for analysis. This prototype was a great way to getting things started and experiment quickly, but after a few iterations we quickly realized that we needed to get serious and build an architecture that satisfied our enterprise requirements.

主站蜘蛛池模板: 德钦县| 威远县| 交口县| 肇州县| 陵水| 德保县| 上饶县| 龙岩市| 夏河县| 托里县| 荥经县| 綦江县| 宜兰县| 康保县| 杭锦旗| 固镇县| 达尔| 廉江市| 澄城县| 四子王旗| 玛曲县| 隆昌县| 大安市| 安阳县| 樟树市| 金坛市| 昌吉市| 麻栗坡县| 藁城市| 湾仔区| 嘉善县| 财经| SHOW| 横峰县| 沙雅县| 郸城县| 谷城县| 文化| 鄂托克前旗| 新宁县| 来凤县|