官术网_书友最值得收藏!

Deep diving into a concrete example

Early on, we wanted to build a data pipeline that extracted insights from Twitter by doing sentiment analysis of tweets containing specific hashtags and to deploy the results to a real-time dashboard. This application was a perfect starting point for us, because the data science analytics were not too complex, and the application covered many aspects of a real-life scenario:

  • High volume, high throughput streaming data
  • Data enrichment with sentiment analysis NLP
  • Basic data aggregation
  • Data visualization
  • Deployment into a real-time dashboard

To try things out, the first implementation was a simple Python application that used the tweepy library (the official Twitter library for Python: https://pypi.python.org/pypi/tweepy) to connect to Twitter and get a stream of tweets and textblob (the simple Python library for basic NLP: https://pypi.python.org/pypi/textblob) for sentiment analysis enrichment.

The results were then saved into a JSON file for analysis. This prototype was a great way to getting things started and experiment quickly, but after a few iterations we quickly realized that we needed to get serious and build an architecture that satisfied our enterprise requirements.

主站蜘蛛池模板: 张家口市| 定兴县| 剑川县| 海南省| 巴塘县| 成安县| 巩留县| 乌兰浩特市| 应城市| 双城市| 麻江县| 当阳市| 绥芬河市| 芷江| 昌邑市| 二手房| 龙岩市| 宾川县| 区。| 秀山| 丹阳市| 云林县| 肃北| 株洲市| 竹溪县| 聊城市| 阿尔山市| 洞头县| 秦皇岛市| 丰宁| 屯昌县| 义乌市| 简阳市| 调兵山市| 三都| 巧家县| 阳谷县| 宁远县| 建阳市| 江川县| 襄城县|