官术网_书友最值得收藏!

Preface

The internet contains a wealth of data. This data is both provided through structured APIs as well as by content delivered directly through websites. While the data in APIs is highly structured, information found in web pages is often unstructured and requires collection, extraction, and processing to be of value. And collecting data is just the start of the journey, as that data must also be stored, mined, and then exposed to others in a value-added form.

With this book, you will learn many of the core tasks needed in collecting various forms of information from websites. We will cover how to collect it, how to perform several common data operations (including storage in local and remote databases), how to perform common media-based tasks such as converting images an videos to thumbnails, how to clean unstructured data with NTLK, how to examine several data mining and visualization tools, and finally core skills in building a microservices-based scraper and API that can, and will, be run on the cloud.

Through a recipe-based approach, we will learn independent techniques to solve specific tasks involved in not only scraping but also data manipulation and management, data mining, visualization, microservices, containers, and cloud operations. These recipes will build skills in a progressive and holistic manner, not only teaching how to perform the fundamentals of scraping but also taking you from the results of scraping to a service offered to others through the cloud. We will be building an actual web-scraper-as-a-service using common tools in the Python, container, and cloud ecosystems.

主站蜘蛛池模板: 体育| 凌云县| 南召县| 克拉玛依市| 永济市| 屯门区| 广水市| 绥棱县| 贺兰县| 大邑县| 濉溪县| 乌鲁木齐县| 壤塘县| 陕西省| 尚志市| 那曲县| 郑州市| 镇沅| 枣庄市| 荥经县| 邯郸县| 德州市| 美姑县| 博客| 平邑县| 天全县| 龙胜| 永嘉县| 贵港市| 永吉县| 镇宁| 喀什市| 环江| 皮山县| 江孜县| 万山特区| 玉林市| 偃师市| 左贡县| 陈巴尔虎旗| 漳浦县|