python - ValueError: Unknown label type: 'unknown' in sklearn -


i getting error - "valueerror: unknown label type: 'unknown'"

i have searched net unable rid of error, new python btw :)

my data has 5 rows , 22 columns, last column label (true,false)

dataset = pandas.read_csv(path)  #dataframe created 

data looks this:

dataset.head()    loc     v(g)    ev(g)   iv(g)   n   v   l   d     e   ...     locode  locomment   loblank     loccodeandcomment   uniq_op     uniq_opnd   total_op    total_opnd  branchcount     defects 0   1.0     1.0     1.0     1.0     1.0     1.00    1.0     1.0     1.00    1.00    ...     1   1   1   1   1.0     1.0     1.0     1.0     1.0     true 1   1.1     1.4     1.4     1.4     1.3     1.30    1.3     1.3     1.30    1.30    ...     2   2   2   1   1.2     1.2     1.2     1.2     1.4     false 2   2.0     1.0     1.0     1.0     1.0     0.00    0.0     0.0     0.00    0.00    ...     0   0   1   0   1.0     0.0     1.0     0.0     1.0     false 3   2.0     1.0     1.0     1.0     1.0     0.00    0.0     0.0     0.00    0.00    ...     0   0   1   0   1.0     0.0     1.0     0.0     1.0     false 4   3.0     1.0     1.0     1.0     22.0    85.95   0.2     5.0     17.19   429.76  ...     1   0   3   0   10.0    5.0     17.0    5.0     1.0     false  5 rows × 22 columns 

rest of code:

array = dataset.values x = array[:,0:21]  # row=all, col=1 21 (index=0to20) y = array[:,21]  # row=all, col=22nd (index=21) x_train, x_test, y_train, y_test = model_selection.train_test_split(x, y, test_size=0.20, random_state=0) #80% training data , 20% test data  kfold = model_selection.kfold(n_splits=10, random_state=0) cv_results = [] 

i getting error on following line:

cv_results = model_selection.cross_val_score(svc(), x_train, y_train, cv=kfold, scoring='accuracy') 

detailed error:

valueerror                                traceback (most recent call last) <ipython-input-31-e1234a2bbe9b> in <module>() ----> 1 cv_results = model_selection.cross_val_score(svc(), x_train, y_train, cv=kfold, scoring='accuracy')  c:\program files\anaconda2\lib\site-packages\sklearn\model_selection\_validation.pyc in cross_val_score(estimator, x, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch)     138                                               train, test, verbose, none,     139                                               fit_params) --> 140                       train, test in cv_iter)     141     return np.array(scores)[:, 0]     142   c:\program files\anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.pyc in __call__(self, iterable)     756             # dispatched. in particular covers edge     757             # case of parallel used exhausted iterator. --> 758             while self.dispatch_one_batch(iterator):     759                 self._iterating = true     760             else:  c:\program files\anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.pyc in dispatch_one_batch(self, iterator)     606                 return false     607             else: --> 608                 self._dispatch(tasks)     609                 return true     610   c:\program files\anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.pyc in _dispatch(self, batch)     569         dispatch_timestamp = time.time()     570         cb = batchcompletioncallback(dispatch_timestamp, len(batch), self) --> 571         job = self._backend.apply_async(batch, callback=cb)     572         self._jobs.append(job)     573   c:\program files\anaconda2\lib\site-packages\sklearn\externals\joblib\_parallel_backends.pyc in apply_async(self, func, callback)     107     def apply_async(self, func, callback=none):     108         """schedule func run""" --> 109         result = immediateresult(func)     110         if callback:     111             callback(result)  c:\program files\anaconda2\lib\site-packages\sklearn\externals\joblib\_parallel_backends.pyc in __init__(self, batch)     324         # don't delay application, avoid keeping input     325         # arguments in memory --> 326         self.results = batch()     327      328     def get(self):  c:\program files\anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.pyc in __call__(self)     129      130     def __call__(self): --> 131         return [func(*args, **kwargs) func, args, kwargs in self.items]     132      133     def __len__(self):  c:\program files\anaconda2\lib\site-packages\sklearn\model_selection\_validation.pyc in _fit_and_score(estimator, x, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, return_n_test_samples, return_times, error_score)     236             estimator.fit(x_train, **fit_params)     237         else: --> 238             estimator.fit(x_train, y_train, **fit_params)     239      240     except exception e:  c:\program files\anaconda2\lib\site-packages\sklearn\svm\base.pyc in fit(self, x, y, sample_weight)     150      151         x, y = check_x_y(x, y, dtype=np.float64, order='c', accept_sparse='csr') --> 152         y = self._validate_targets(y)     153      154         sample_weight = np.asarray([]  c:\program files\anaconda2\lib\site-packages\sklearn\svm\base.pyc in _validate_targets(self, y)     518     def _validate_targets(self, y):     519         y_ = column_or_1d(y, warn=true) --> 520         check_classification_targets(y)     521         cls, y = np.unique(y_, return_inverse=true)     522         self.class_weight_ = compute_class_weight(self.class_weight, cls, y_)  c:\program files\anaconda2\lib\site-packages\sklearn\utils\multiclass.pyc in check_classification_targets(y)     170     if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',     171             'multilabel-indicator', 'multilabel-sequences']: --> 172         raise valueerror("unknown label type: %r" % y_type)     173      174   valueerror: unknown label type: 'unknown' 

part 1

the error related y variable use.

you need transform true/false 0/1 y variable contain 0s , 1s. should fix error.

from documentation see here :

y : array-like, shape (n_samples,) target values (class labels in classification, real numbers in regression)

part 2

next, should either use cross validation automatically split data x_train,x_test , y_train, y_test or train_test_split function , manually this:

clf = svc() clf.fit(x_train, y_train) y_pred = clf.predict(x_test) ... 

on other hand, if want use cross validation kfold use:

kfold = model_selection.kfold(n_splits=10, random_state=0)  cv_results = model_selection.cross_val_score(svc(), x, y, cv=kfold, scoring='accuracy') 

this going create automatically x_train,x_test , y_train, y_test , give cv_results.


Comments

Popular posts from this blog

python - Selenium remoteWebDriver (& SauceLabs) Firefox moseMoveTo action exception -

html - How to custom Bootstrap grid height? -

transpose - Maple isnt executing function but prints function term -