apache - Evaluate my forecast using a Pipeline Model -
there part of code below ; know how can evaluate forecast ? if, want know importance of feature there trick use featureimportances of randomforestregressionmodel ? should directly switch randomforestregressionmodel , not using pipelinemodel ?
i read using pipeline give better results that's why i'm using it. tried using regressionevaluator don't want.
or should think simple , convert dataframe rdd , use regressionmetrics mean squared error.
to summarize, need know best method evaluate forecast.
val assembler = new vectorassembler() .setinputcols(array("customers", "year", "month", "dayofmonth", "dayofweek", "weekofyear", "dayofyear")) .setoutputcol("features") val limitdate = "2017-04-01" val trainingdata = df_2.filter(df_2("time").lt(lit(limitdate))) //trainingdata.printschema() val rf = new randomforestregressor() .setnumtrees(60) .setmaxdepth(25) .setmaxbins(100) .setlabelcol("amount") .setfeaturescol("features") val pipeline = new pipeline().setstages(array(assembler, rf)) //train model val model = pipeline.fit(trainingdata) //make predictions val predictions = model.transform(df_2)
for need answer ; here how deal problem.
you can "transform"/"cast" pipeline model type u need using asinstanceof :
val pipeline = new pipeline().setstages(array(assembler, rf)) val newmodel = model.stages("numberstage").asinstanceof[themodelyouwant]
change numberstage index of algorithm in pipeline, in pipeline 1. (for rf)
change themodelyouwant type of model need, in case randomforestregressionmodel.
then can create own evaluator want use model.
if want transform df in rdd[(double, double)], u can use .rdd & .map :
val predictionsandlabels= df.select("amount", "prediction").rdd.map {case (row) => ((row.getint(0).todouble), (row.getdouble(1)))}
i have rdd[(double, double)] , can use regressionmetrics. hope someone.
Comments
Post a Comment