官术网_书友最值得收藏!

Introduction

A common practice in scraping is the download, storage, and further processing of media content (non-web pages or data files). This media can include images, audio, and video.  To store the content locally (or in a service like S3) and do it correctly, we need to know what the type of media is, and it's not enough to trust the file extension in the URL.  We will learn how to download and correctly represent the media type based on information from the web server.

Another common task is the generation of thumbnails of images, videos, or even a page of a website.  We will examine several techniques of how to generate thumbnails and make website page screenshots.  Many times these are used on a new website as thumbnail links to the scraped media that is now stored locally.

Finally, it is often the need to be able to transcode media, such as converting non-MP4 videos to MP4, or changing the bit-rate or resolution of a video.  Another scenario is to extract only the audio from a video file.  We won't look at video transcoding, but we will rip MP3 audio out of an MP4 file using ffmpeg.  It's a simple step from there to also transcode video with ffmpeg.

主站蜘蛛池模板: 霍山县| 云林县| 通州区| 略阳县| 吴桥县| 吉水县| 万安县| 兖州市| 英吉沙县| 滁州市| 兴国县| 宝清县| 泗洪县| 三门峡市| 永川市| 万宁市| 荣昌县| 瑞安市| 富民县| 灵台县| 沭阳县| 黄骅市| 文山县| 印江| 昭通市| 英吉沙县| 巴南区| 新兴县| 安龙县| 宝兴县| 若尔盖县| 建阳市| 万山特区| 绥江县| 平阳县| 剑河县| 特克斯县| 吉林省| 阳城县| 伊春市| 聂荣县|