官术网_书友最值得收藏!

Understanding how to perform HTTP GET requests

One of the most resourceful places to find good data is online. GET requests are common methods of communicating with an HTTP web server. In this recipe, we will grab all the links from a Wikipedia article and print them to the terminal. To easily grab all the links, we will use a helpful library called HandsomeSoup, which lets us easily manipulate and traverse a webpage through CSS selectors.

Getting ready

We will be collecting all links from a Wikipedia web page. Make sure to have an Internet connection before running this recipe.

Install the HandsomeSoup CSS selector package, and also install the HXT library if it is not already installed. To do this, use the following commands:

$ cabal install HandsomeSoup
$ cabal install hxt

How to do it...

  1. This recipe requires hxt for parsing HTML and requires HandsomeSoup for the easy-to-use CSS selectors, as shown in the following code snippet:
    import Text.XML.HXT.Core
    import Text.HandsomeSoup
  2. Define and implement main as follows:
    main :: IO ()
    main = do
  3. Pass in the URL as a string to HandsomeSoup's fromUrl function:
        let doc = fromUrl "http://en.wikipedia.org/wiki/Narwhal"
  4. Select all links within the bodyContent field of the Wikipedia page as follows:
        links <- runX $ doc >>> css "#bodyContent a" ! "href"
        print links

How it works…

The HandsomeSoup package allows easy CSS selectors. In this recipe, we run the #bodyContent a selector on a Wikipedia article web page. This finds all link tags that are descendants of an element with the bodyContent ID.

See also…

Another common way to obtain data online is through POST requests. To find out more, refer to the Learning how to perform HTTP POST requests recipe.

主站蜘蛛池模板: 嘉兴市| 浮山县| 太白县| 喀喇沁旗| 曲靖市| 吴堡县| 金平| 任丘市| 庆云县| 惠安县| 沅江市| 象州县| 青岛市| 凉城县| 衡阳县| 海原县| 梓潼县| 迁安市| 稻城县| 龙江县| 铁岭市| 海晏县| 彭州市| 凌源市| 葫芦岛市| 宁阳县| 泾川县| 景泰县| 册亨县| 湖北省| 盖州市| 佛山市| 铜梁县| 保靖县| 谷城县| 大宁县| 攀枝花市| 南木林县| 会东县| 庐江县| 道真|