官术网_书友最值得收藏!

Consuming Location Data Like a Data Scientist

Location comes in different forms, but what if it comes in a simple structured data format and we overlooked it all this time? Most machine learning algorithms, such as random forests, are geared toward creating insights from structured data in tabular form. In this chapter, we will discuss how to leverage spatial data that is masquerading as tabular data and apply machine learning techniques to it as any data scientist would. For this chapter, we will be using New York taxi trip data to predict trip duration for any given New York taxi trip. We are choosing this dataset because of the following reasons:

  • Predicting trip duration has the right mix of geospatial analytics and machine learning
  • Finding the time it takes to travel from point A to point B is a routing problem, which will be dealt with in Chapter 6, Let's Build a Routing Engine, and so this chapter is a perfect introduction

We will be using a library known as fastai, an amazing Python library built around popular machine learning libraries such as scikit-learn and PyTorch. In this chapter, we will be discussing the following topics:

  • Exploratory data analysis
  • Processing spatial data
  • Understanding and inferring the error metric
  • Building and inferencing a random forest model
主站蜘蛛池模板: 玉门市| 台东市| 军事| 阳高县| 宁德市| 瑞丽市| 甘德县| 长海县| 嘉兴市| 睢宁县| 绥芬河市| 兰考县| 永定县| 新郑市| 蓬莱市| 山东省| 铅山县| 东山县| 滨州市| 玉树县| 青川县| 正定县| 合作市| 河池市| 海城市| 新邵县| 朔州市| 临高县| 二连浩特市| 阿勒泰市| 政和县| 西乡县| 赤壁市| 桦南县| 萨迦县| 开封县| 皮山县| 乡宁县| 嵊泗县| 丹江口市| 前郭尔|