官术网_书友最值得收藏!

Review questions and exercises

  1. What is the difference between open data and proprietary databases?
  2. Is it enough for learners in the area of data science to use open data?
  3. Where can we access open public data?
  4. From The UCI Data Depository, http://archive.ics.uci.edu/ml/index.php, download a dataset called Wine. Write a program in R to import it.
  5. From the UCI Data Depository, download a dataset called Forest Fire. Write a program in Python to import it.
  6. From the UCI Data Depository, download a dataset called Bank Marketing. Write a program in Octave to import it. Answer the following questions: 1) How many banks? and 2) What is the cost?
  7. How can we find all R functions with read. as their leading letters? (Note that there is a dot after read.)
  8. How can we find more information on an R function called read.xls()?
  9. Explain the differences between two R functions: save() and saveRDS().
  10. Find more information about the read_clipboard() function included in the Python pandas package.
  1. What is the Quandl platform? What kinds of data could we download from Quandl?
  2. Write both R and Python programs to download GDP (Gross Domestic Product) data from the Quandl platform.
  3. When loading an R dataset, what is the difference between using the load() function and the readRDS() function?
  4. After uploading the Python pandas package, explain why we have the following error message:
  1. First, download a ZIP file called bank-fall.zip at http://archive.ics.uci.edu/ml/datasets/Bank+Marketing. Unzip the file to get a CSV file; see the related code that follows:

Generate an R dataset called bank.Rata and bank.rds and answer the following questions: a) What is the average age? b) What percentage of people are married? c) Is the default probability of those who are married higher than those who are single?

  1. How do we merge two datasets in R?
  2. Write a Python program to download IBM's daily data from Quandl and merge it with Fama-French three-factor. To get a Fama-French daily factor time series we could go to http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html or download a dataset at http://canisius.edu/~yany/python/data/ffDaily.pkl.
  3. Generate both R and Python datasets for monthly Fama-French-Charhart four factors. Both time series, can be downloaded from Professor French's data library.
  4. Write a Python program to merge FRED/GDP data with market index data.
主站蜘蛛池模板: 将乐县| 阳春市| 福泉市| 桃园县| 望江县| 图们市| 思茅市| 沁源县| 东乌珠穆沁旗| 太白县| 汾西县| 垫江县| 旬邑县| 遂宁市| 南投县| 晋州市| 依安县| 东光县| 山东省| 陵川县| 饶平县| 六枝特区| 尼勒克县| 吴江市| 汉川市| 上饶市| 海盐县| 娱乐| 长垣县| 英德市| 临西县| 平度市| 将乐县| 香格里拉县| 泽州县| 闻喜县| 临颍县| 客服| 册亨县| 凤山市| 乌鲁木齐市|