官术网_书友最值得收藏!

Consuming Location Data Like a Data Scientist

Location comes in different forms, but what if it comes in a simple structured data format and we overlooked it all this time? Most machine learning algorithms, such as random forests, are geared toward creating insights from structured data in tabular form. In this chapter, we will discuss how to leverage spatial data that is masquerading as tabular data and apply machine learning techniques to it as any data scientist would. For this chapter, we will be using New York taxi trip data to predict trip duration for any given New York taxi trip. We are choosing this dataset because of the following reasons:

  • Predicting trip duration has the right mix of geospatial analytics and machine learning
  • Finding the time it takes to travel from point A to point B is a routing problem, which will be dealt with in Chapter 6, Let's Build a Routing Engine, and so this chapter is a perfect introduction

We will be using a library known as fastai, an amazing Python library built around popular machine learning libraries such as scikit-learn and PyTorch. In this chapter, we will be discussing the following topics:

  • Exploratory data analysis
  • Processing spatial data
  • Understanding and inferring the error metric
  • Building and inferencing a random forest model
主站蜘蛛池模板: 云龙县| 江都市| 工布江达县| 桂平市| 方正县| 平武县| 盐亭县| 瓮安县| 固始县| 交口县| 稻城县| 辽源市| 永胜县| 尚志市| 汝南县| 新和县| 集贤县| 从化市| 南溪县| 洱源县| 文化| 正镶白旗| 静安区| 奉贤区| 莱州市| 罗田县| 琼海市| 玉林市| 乐昌市| 井冈山市| 星子县| 禹城市| 东辽县| 阿合奇县| 泗水县| 鸡东县| 六枝特区| 汶上县| 巍山| 天祝| 剑川县|