- R Deep Learning Cookbook
- Dr. PKS Prakash Achyutuni Sri Krishna Rao
- 403字
- 2021-07-02 20:49:11
How to do it...
The section will demonstrate steps to build the GLM model using H2O.
- Now, load the occupancy train and test datasets in R:
# Load the occupancy data
occupancy_train <-read.csv("C:/occupation_detection/datatraining.txt",stringsAsFactors = T)
occupancy_test <- read.csv("C:/occupation_detection/datatest.txt",stringsAsFactors = T)
- The following independent (x) and dependent (y) variables will be used to model GLM:
# Define input (x) and output (y) variables"
x = c("Temperature", "Humidity", "Light", "CO2", "HumidityRatio")
y = "Occupancy"
- Based on the requirement for H2O, convert the dependent variables into factors as follows:
# Convert the outcome variable into factor
occupancy_train$Occupancy <- as.factor(occupancy_train$Occupancy)
occupancy_test$Occupancy <- as.factor(occupancy_test$Occupancy)
- Then, convert the datasets to H2OParsedData objects:
occupancy_train.hex <- as.h2o(x = occupancy_train, destination_frame = "occupancy_train.hex")
occupancy_test.hex <- as.h2o(x = occupancy_test, destination_frame = "occupancy_test.hex")
- Once the data is loaded and converted to H2OParsedData objects, run a GLM model using the h2o.glm function. In the current setup, we intend to train for parameters such as five-fold cross validation, elastic net regularization (α = 5), and optimal regularization strength (with lamda_search = TRUE):
# Train the model
occupancy_train.glm <- h2o.glm(x = x, # Vector of predictor variable names
y = y, # Name of response/dependent variable
training_frame = occupancy_train.hex, # Training data
seed = 1234567, # Seed for random numbers
family = "binomial", # Outcome variable
lambda_search = TRUE, # Optimum regularisation lambda
alpha = 0.5, # Elastic net regularisation
nfolds = 5 # N-fold cross validation
)
- In addition to the preceding command, you can also define other parameters to fine-tune the model performance. The following list does not cover all the functional parameters, but covers some based on importance. The complete list of parameters can be found in the documentation of the h2o package.
-
- Specify the strategy of generating cross-validation samples such as random sampling, stratified sampling, modulo sampling, and auto (select) using fold_assignment. The sampling can also be performed on a particular attribute by specifying the column name (fold_column).
- Option to handle skewed outcomes (imbalanced data) by specifying weights to each observation using weights_column or performing over/under sampling using balance_classes.
- Option to handle missing values by mean imputation or observation skip using missing_values_handling.
- Option to restrict the coefficients to be non-negative using non_negative and constrain their values using beta_constraints.
- Option to provide prior probability for y==1(logistic regression) in the case of sampled data if its mean of response does not reflect the reality (prior).
- Specify the variables to be considered for interactions (interactions).
推薦閱讀
- Moodle Administration Essentials
- Java Web及其框架技術(shù)
- Apache Spark 2 for Beginners
- Scratch真好玩:教小孩學(xué)編程
- Python機器學(xué)習(xí):手把手教你掌握150個精彩案例(微課視頻版)
- Learning Laravel's Eloquent
- Lighttpd源碼分析
- QGIS Python Programming Cookbook(Second Edition)
- Spring MVC+MyBatis開發(fā)從入門到項目實踐(超值版)
- Geospatial Development By Example with Python
- HTML+CSS+JavaScript編程入門指南(全2冊)
- Flask Web開發(fā):基于Python的Web應(yīng)用開發(fā)實戰(zhàn)(第2版)
- Kotlin進階實戰(zhàn)
- 大話C語言
- Swift 2 Blueprints