python - ValueError: Unknown label type: 'unknown' in sklearn -
i getting error - "valueerror: unknown label type: 'unknown'"
i have searched net unable rid of error, new python btw :)
my data has 5 rows , 22 columns, last column label (true,false)
dataset = pandas.read_csv(path) #dataframe created data looks this:
dataset.head() loc v(g) ev(g) iv(g) n v l d e ... locode locomment loblank loccodeandcomment uniq_op uniq_opnd total_op total_opnd branchcount defects 0 1.0 1.0 1.0 1.0 1.0 1.00 1.0 1.0 1.00 1.00 ... 1 1 1 1 1.0 1.0 1.0 1.0 1.0 true 1 1.1 1.4 1.4 1.4 1.3 1.30 1.3 1.3 1.30 1.30 ... 2 2 2 1 1.2 1.2 1.2 1.2 1.4 false 2 2.0 1.0 1.0 1.0 1.0 0.00 0.0 0.0 0.00 0.00 ... 0 0 1 0 1.0 0.0 1.0 0.0 1.0 false 3 2.0 1.0 1.0 1.0 1.0 0.00 0.0 0.0 0.00 0.00 ... 0 0 1 0 1.0 0.0 1.0 0.0 1.0 false 4 3.0 1.0 1.0 1.0 22.0 85.95 0.2 5.0 17.19 429.76 ... 1 0 3 0 10.0 5.0 17.0 5.0 1.0 false 5 rows × 22 columns rest of code:
array = dataset.values x = array[:,0:21] # row=all, col=1 21 (index=0to20) y = array[:,21] # row=all, col=22nd (index=21) x_train, x_test, y_train, y_test = model_selection.train_test_split(x, y, test_size=0.20, random_state=0) #80% training data , 20% test data kfold = model_selection.kfold(n_splits=10, random_state=0) cv_results = [] i getting error on following line:
cv_results = model_selection.cross_val_score(svc(), x_train, y_train, cv=kfold, scoring='accuracy') detailed error:
valueerror traceback (most recent call last) <ipython-input-31-e1234a2bbe9b> in <module>() ----> 1 cv_results = model_selection.cross_val_score(svc(), x_train, y_train, cv=kfold, scoring='accuracy') c:\program files\anaconda2\lib\site-packages\sklearn\model_selection\_validation.pyc in cross_val_score(estimator, x, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch) 138 train, test, verbose, none, 139 fit_params) --> 140 train, test in cv_iter) 141 return np.array(scores)[:, 0] 142 c:\program files\anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.pyc in __call__(self, iterable) 756 # dispatched. in particular covers edge 757 # case of parallel used exhausted iterator. --> 758 while self.dispatch_one_batch(iterator): 759 self._iterating = true 760 else: c:\program files\anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.pyc in dispatch_one_batch(self, iterator) 606 return false 607 else: --> 608 self._dispatch(tasks) 609 return true 610 c:\program files\anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.pyc in _dispatch(self, batch) 569 dispatch_timestamp = time.time() 570 cb = batchcompletioncallback(dispatch_timestamp, len(batch), self) --> 571 job = self._backend.apply_async(batch, callback=cb) 572 self._jobs.append(job) 573 c:\program files\anaconda2\lib\site-packages\sklearn\externals\joblib\_parallel_backends.pyc in apply_async(self, func, callback) 107 def apply_async(self, func, callback=none): 108 """schedule func run""" --> 109 result = immediateresult(func) 110 if callback: 111 callback(result) c:\program files\anaconda2\lib\site-packages\sklearn\externals\joblib\_parallel_backends.pyc in __init__(self, batch) 324 # don't delay application, avoid keeping input 325 # arguments in memory --> 326 self.results = batch() 327 328 def get(self): c:\program files\anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.pyc in __call__(self) 129 130 def __call__(self): --> 131 return [func(*args, **kwargs) func, args, kwargs in self.items] 132 133 def __len__(self): c:\program files\anaconda2\lib\site-packages\sklearn\model_selection\_validation.pyc in _fit_and_score(estimator, x, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, return_n_test_samples, return_times, error_score) 236 estimator.fit(x_train, **fit_params) 237 else: --> 238 estimator.fit(x_train, y_train, **fit_params) 239 240 except exception e: c:\program files\anaconda2\lib\site-packages\sklearn\svm\base.pyc in fit(self, x, y, sample_weight) 150 151 x, y = check_x_y(x, y, dtype=np.float64, order='c', accept_sparse='csr') --> 152 y = self._validate_targets(y) 153 154 sample_weight = np.asarray([] c:\program files\anaconda2\lib\site-packages\sklearn\svm\base.pyc in _validate_targets(self, y) 518 def _validate_targets(self, y): 519 y_ = column_or_1d(y, warn=true) --> 520 check_classification_targets(y) 521 cls, y = np.unique(y_, return_inverse=true) 522 self.class_weight_ = compute_class_weight(self.class_weight, cls, y_) c:\program files\anaconda2\lib\site-packages\sklearn\utils\multiclass.pyc in check_classification_targets(y) 170 if y_type not in ['binary', 'multiclass', 'multiclass-multioutput', 171 'multilabel-indicator', 'multilabel-sequences']: --> 172 raise valueerror("unknown label type: %r" % y_type) 173 174 valueerror: unknown label type: 'unknown'
part 1
the error related y variable use.
you need transform true/false 0/1 y variable contain 0s , 1s. should fix error.
from documentation see here :
y : array-like, shape (n_samples,) target values (class labels in classification, real numbers in regression)
part 2
next, should either use cross validation automatically split data x_train,x_test , y_train, y_test or train_test_split function , manually this:
clf = svc() clf.fit(x_train, y_train) y_pred = clf.predict(x_test) ... on other hand, if want use cross validation kfold use:
kfold = model_selection.kfold(n_splits=10, random_state=0) cv_results = model_selection.cross_val_score(svc(), x, y, cv=kfold, scoring='accuracy') this going create automatically x_train,x_test , y_train, y_test , give cv_results.
Comments
Post a Comment