HROCH
Symbolic regression and classification library
High-Performance python symbolic regression library based on parallel local search
- Zero hyperparameter tunning.
- Accurate results in seconds or minutes, in contrast to slow GP-based methods.
- Small models size.
- Support for regression, classification and fuzzy math.
- Support 32 and 64 bit floating point arithmetic.
- Work with unprotected version of math operators (log, sqrt, division)
- Speedup search by using feature importances computed from bbox model
Supported instructions | |
---|---|
math | add, sub, mul, div, pdiv, inv, minv, sq2, pow, exp, log, sqrt, cbrt, aq |
goniometric | sin, cos, tan, asin, acos, atan, sinh, cosh, tanh |
other | nop, max, min, abs, floor, ceil, lt, gt, lte, gte |
fuzzy | f_and, f_or, f_xor, f_impl, f_not, f_nand, f_nor, f_nxor, f_nimpl |
Sources
C++20 source code available in separate repo sr_core
Dependencies
- AVX2 instructions set(all modern CPU support this)
- numpy
- sklearn
Installation
pip install HROCH
Usage
Symbolic_Regression_Demo.ipynb
from HROCH import SymbolicRegressor
reg = SymbolicRegressor(num_threads=8, time_limit=60.0, problem='math', precision='f64')
reg.fit(X_train, y_train)
yp = reg.predict(X_test)
Changelog
v1.4
- Sklearn compatibility
- Classificators:
- NonlinearLogisticRegressor for a binary classification
- SymbolicClassifier for multiclass classification
- FuzzyRegressor for a special binary classification
- Xi corelation used for filter unrelated features
Older versions
v1.3
- Public c++ sources
- Commanline interface changed to cpython
- Support for classification score logloss and accuracy
- Support for final transformations:
- ordinal regression
- logistic function
- clipping
- Acess to equations from all paralel hillclimbers
- User defined constants
v1.2
- Features probability as input parameter
- Custom instructions set
- Parallel hilclimbing parameters
v1.1
- Improved late acceptance hillclimbing
v1.0
- First release
SRBench
1""" 2.. include:: ../README.md 3""" 4 5from .hroch import RegressorMathModel, ClassifierMathModel, Xicor 6from .regressor import SymbolicRegressor 7from .fuzzy import FuzzyRegressor, FuzzyClassifier 8from .classifier import NonlinearLogisticRegressor, SymbolicClassifier 9from .version import __version__ 10 11__all__ = ["SymbolicRegressor", "NonlinearLogisticRegressor", "SymbolicClassifier", "FuzzyRegressor", "FuzzyClassifier", "RegressorMathModel", "ClassifierMathModel", "Xicor", "__version__"]
7class SymbolicRegressor(RegressorMixin, SymbolicSolver): 8 """ 9 SymbolicRegressor class 10 11 Parameters 12 ---------- 13 num_threads : int, default=1 14 Number of used threads. 15 16 time_limit : float, default=5.0 17 Timeout in seconds. If is set to 0 there is no limit and the algorithm runs until iter_limit is met. 18 19 iter_limit : int, default=0 20 Iterations limit. If is set to 0 there is no limit and the algorithm runs until time_limit is met. 21 22 precision : str, default='f32' 23 'f64' or 'f32'. Internal floating number representation. 24 25 problem : str or dict, default='math' 26 Predefined instructions sets 'math' or 'simple' or 'fuzzy' or custom defines set of instructions with mutation probability. 27 ```python 28 problem={'add':10.0, 'mul':10.0, 'gt':1.0, 'lt':1.0, 'nop':1.0} 29 ``` 30 31 |**supported instructions**|| 32 |-|-| 33 |**math**|add, sub, mul, div, pdiv, inv, minv, sq2, pow, exp, log, sqrt, cbrt, aq| 34 |**goniometric**|sin, cos, tan, asin, acos, atan, sinh, cosh, tanh| 35 |**other**|nop, max, min, abs, floor, ceil, lt, gt, lte, gte| 36 |**fuzzy**|f_and, f_or, f_xor, f_impl, f_not, f_nand, f_nor, f_nxor, f_nimpl| 37 38 *nop - no operation* 39 40 *pdiv - protected division* 41 42 *inv - inverse* $(-x)$ 43 44 *minv - multiplicative inverse* $(1/x)$ 45 46 *lt, gt, lte, gte -* $<, >, <=, >=$ 47 48 feature_probs : str or array of shape (n_features,), default='xicor' 49 The probability that a mutation will select a feature. 50 If None then the features are selected with equal probability. 51 If 'xicor' then the probabilities are deriveded from xicor corelation coefficient https://doi.org/10.1080/01621459.2020.1758115 52 53 random_state : int, default=0 54 Random generator seed. If 0 then random generator will be initialized by system time. 55 56 verbose : int, default=0 57 Controls the verbosity when fitting and predicting. 58 59 metric : str, default='MSE' 60 Metric used for evaluating error. Choose from {'MSE', 'MAE', 'MSLE', 'LogLoss'} 61 62 transformation : str, default=None 63 Final transformation for computed value. Choose from { None, 'LOGISTIC', 'ORDINAL'} 64 65 algo_settings : dict, default = None 66 If not defined SymbolicSolver.ALGO_SETTINGS is used. 67 ```python 68 algo_settings = {'neighbours_count':15, 'alpha':0.15, 'beta':0.5, 'pretest_size':1, 'sample_size':16} 69 ``` 70 - 'neighbours_count' : (int) Number tested neighbours in each iteration 71 - 'alpha' : (float) Score worsening limit for a iteration 72 - 'beta' : (float) Tree breadth-wise expanding factor in a range from 0 to 1 73 - 'pretest_size' : (int) Batch count(batch is 64 rows sample) for fast fitness preevaluating 74 - 'sample_size : (int) Number of batches of sample used to calculate the score during training 75 76 code_settings : dict, default = None 77 If not defined SymbolicSolver.CODE_SETTINGS is used. 78 ```python 79 code_settings = {'min_size': 32, 'max_size':32, 'const_size':8} 80 ``` 81 - 'const_size' : (int) Maximum alloved constants in symbolic model, accept also 0. 82 - 'min_size': (int) Minimum allowed equation size(as a linear program). 83 - 'max_size' : (int) Maximum allowed equation size(as a linear program). 84 85 population_settings : dict, default = None 86 If not defined SymbolicSolver.POPULATION_SETTINGS is used. 87 ```python 88 population_settings = {'size': 64, 'tournament':4} 89 ``` 90 - 'size' : (int) Number of individuals in the population. 91 - 'tournament' : (int) Tournament selection. 92 93 init_const_settings : dict, default = None 94 If not defined SymbolicSolver.INIT_CONST_SETTINGS is used. 95 ```python 96 init_const_settings = {'const_min':-1.0, 'const_max':1.0, 'predefined_const_prob':0.0, 'predefined_const_set': []} 97 ``` 98 - 'const_min' : (float) Lower range for initializing constants. 99 - 'const_max' : (float) Upper range for initializing constants. 100 - 'predefined_const_prob': (float) Probability of selecting one of the predefined constants during initialization. 101 - 'predefined_const_set' : (array of floats) Predefined constants used during initialization. 102 103 const_settings : dict, default = None 104 If not defined SymbolicSolver.CONST_SETTINGS is used. 105 ```python 106 const_settings = {'const_min':-LARGE_FLOAT, 'const_max':LARGE_FLOAT, 'predefined_const_prob':0.0, 'predefined_const_set': []} 107 ``` 108 - 'const_min' : (float) Lower range for constants used in equations. 109 - 'const_max' : (float) Upper range for constants used in equations. 110 - 'predefined_const_prob': (float) Probability of selecting one of the predefined constants during search process(mutation). 111 - 'predefined_const_set' : (array of floats) Predefined constants used during search process(mutation). 112 113 target_clip : array of two float values clip_min and clip_max, default None 114 ```python 115 target_clip=[-1, 1] 116 ``` 117 118 cv_params : dict, default = None 119 If not defined SymbolicSolver.REGRESSION_CV_PARAMS is used 120 ```python 121 cv_params = {'n':0, 'cv_params':{}, 'select':'mean', 'opt_params':{'method': 'Nelder-Mead'}, 'opt_metric':make_scorer(mean_squared_error, greater_is_better=False)} 122 ``` 123 - 'n' : (int) Crossvalidate n top models 124 - 'cv_params' : (dict) Parameters passed to scikit-learn cross_validate method 125 - select : (str) Best model selection method choose from 'mean'or 'median' 126 - opt_params : (dict) Parameters passed to scipy.optimize.minimize method 127 - opt_metric : (make_scorer) Scoring method 128 129 warm_start : bool, default=False 130 If True, then the solver will be reused for the next call of fit. 131 """ 132 133 def __init__( 134 self, 135 num_threads: int = 1, 136 time_limit: float = 5.0, 137 iter_limit: int = 0, 138 precision: str = "f32", 139 problem="math", 140 feature_probs="xicor", 141 random_state: int = 0, 142 verbose: int = 0, 143 metric: str = "MSE", 144 transformation: str = None, 145 algo_settings=None, 146 code_settings=None, 147 population_settings=None, 148 init_const_settings=None, 149 const_settings=None, 150 target_clip: Iterable = None, 151 cv_params=None, 152 warm_start: bool = False, 153 ): 154 super(SymbolicRegressor, self).__init__( 155 num_threads=num_threads, 156 time_limit=time_limit, 157 iter_limit=iter_limit, 158 precision=precision, 159 problem=problem, 160 feature_probs=feature_probs, 161 random_state=random_state, 162 verbose=verbose, 163 metric=metric, 164 transformation=transformation, 165 algo_settings=algo_settings, 166 code_settings=code_settings, 167 population_settings=population_settings, 168 init_const_settings=init_const_settings, 169 const_settings=const_settings, 170 target_clip=target_clip, 171 class_weight=None, 172 cv_params=cv_params, 173 warm_start=warm_start, 174 ) 175 176 def fit(self, X, y, sample_weight=None, check_input=True): 177 """ 178 Fit the symbolic models according to the given training data. 179 180 Parameters 181 ---------- 182 X : array-like of shape (n_samples, n_features) 183 Training vector, where `n_samples` is the number of samples and 184 `n_features` is the number of features. 185 186 y : array-like of shape (n_samples,) 187 Target vector relative to X. 188 189 sample_weight : array-like of shape (n_samples,) default=None 190 Array of weights that are assigned to individual samples. 191 If not provided, then each sample is given unit weight. 192 193 check_input : bool, default=True 194 Allow to bypass several input checking. 195 Don't use this parameter unless you know what you're doing. 196 197 Returns 198 ------- 199 self 200 Fitted estimator. 201 """ 202 203 super(SymbolicRegressor, self).fit(X, y, sample_weight=sample_weight, check_input=check_input) 204 return self 205 206 def predict(self, X, id=None, check_input=True, use_parsed_model=True): 207 """ 208 Predict regression target for X. 209 210 Parameters 211 ---------- 212 X : array-like of shape (n_samples, n_features) 213 The input samples. 214 215 id : int 216 Model id, default=None. id can be obtained from get_models method. If its None prediction use the best model. 217 218 check_input : bool, default=True 219 Allow to bypass several input checking. 220 Don't use this parameter unless you know what you're doing. 221 222 Returns 223 ------- 224 y : ndarray of shape (n_samples,) or (n_samples, n_outputs) 225 The predicted values. 226 """ 227 return super(SymbolicRegressor, self).predict(X, id=id, check_input=check_input, use_parsed_model=use_parsed_model) 228 229 def __sklearn_tags__(self): 230 return super().__sklearn_tags__()
SymbolicRegressor class
Parameters
- num_threads (int, default=1): Number of used threads.
- time_limit (float, default=5.0): Timeout in seconds. If is set to 0 there is no limit and the algorithm runs until iter_limit is met.
- iter_limit (int, default=0): Iterations limit. If is set to 0 there is no limit and the algorithm runs until time_limit is met.
- precision (str, default='f32'): 'f64' or 'f32'. Internal floating number representation.
problem (str or dict, default='math'): Predefined instructions sets 'math' or 'simple' or 'fuzzy' or custom defines set of instructions with mutation probability.
problem={'add':10.0, 'mul':10.0, 'gt':1.0, 'lt':1.0, 'nop':1.0}
supported instructions math add, sub, mul, div, pdiv, inv, minv, sq2, pow, exp, log, sqrt, cbrt, aq goniometric sin, cos, tan, asin, acos, atan, sinh, cosh, tanh other nop, max, min, abs, floor, ceil, lt, gt, lte, gte fuzzy f_and, f_or, f_xor, f_impl, f_not, f_nand, f_nor, f_nxor, f_nimpl nop - no operation
pdiv - protected division
inv - inverse $(-x)$
minv - multiplicative inverse $(1/x)$
lt, gt, lte, gte - $<, >, <=, >=$
- feature_probs (str or array of shape (n_features,), default='xicor'): The probability that a mutation will select a feature. If None then the features are selected with equal probability. If 'xicor' then the probabilities are deriveded from xicor corelation coefficient https://doi.org/10.1080/01621459.2020.1758115
- random_state (int, default=0): Random generator seed. If 0 then random generator will be initialized by system time.
- verbose (int, default=0): Controls the verbosity when fitting and predicting.
- metric (str, default='MSE'): Metric used for evaluating error. Choose from {'MSE', 'MAE', 'MSLE', 'LogLoss'}
- transformation (str, default=None): Final transformation for computed value. Choose from { None, 'LOGISTIC', 'ORDINAL'}
algo_settings (dict, default = None): If not defined SymbolicSolver.ALGO_SETTINGS is used.
algo_settings = {'neighbours_count':15, 'alpha':0.15, 'beta':0.5, 'pretest_size':1, 'sample_size':16}
- 'neighbours_count' : (int) Number tested neighbours in each iteration
- 'alpha' : (float) Score worsening limit for a iteration
- 'beta' : (float) Tree breadth-wise expanding factor in a range from 0 to 1
- 'pretest_size' : (int) Batch count(batch is 64 rows sample) for fast fitness preevaluating
- 'sample_size : (int) Number of batches of sample used to calculate the score during training
code_settings (dict, default = None): If not defined SymbolicSolver.CODE_SETTINGS is used.
code_settings = {'min_size': 32, 'max_size':32, 'const_size':8}
- 'const_size' : (int) Maximum alloved constants in symbolic model, accept also 0.
- 'min_size': (int) Minimum allowed equation size(as a linear program).
- 'max_size' : (int) Maximum allowed equation size(as a linear program).
population_settings (dict, default = None): If not defined SymbolicSolver.POPULATION_SETTINGS is used.
population_settings = {'size': 64, 'tournament':4}
- 'size' : (int) Number of individuals in the population.
- 'tournament' : (int) Tournament selection.
init_const_settings (dict, default = None): If not defined SymbolicSolver.INIT_CONST_SETTINGS is used.
init_const_settings = {'const_min':-1.0, 'const_max':1.0, 'predefined_const_prob':0.0, 'predefined_const_set': []}
- 'const_min' : (float) Lower range for initializing constants.
- 'const_max' : (float) Upper range for initializing constants.
- 'predefined_const_prob': (float) Probability of selecting one of the predefined constants during initialization.
- 'predefined_const_set' : (array of floats) Predefined constants used during initialization.
const_settings (dict, default = None): If not defined SymbolicSolver.CONST_SETTINGS is used.
const_settings = {'const_min':-LARGE_FLOAT, 'const_max':LARGE_FLOAT, 'predefined_const_prob':0.0, 'predefined_const_set': []}
- 'const_min' : (float) Lower range for constants used in equations.
- 'const_max' : (float) Upper range for constants used in equations.
- 'predefined_const_prob': (float) Probability of selecting one of the predefined constants during search process(mutation).
- 'predefined_const_set' : (array of floats) Predefined constants used during search process(mutation).
target_clip (array of two float values clip_min and clip_max, default None):
target_clip=[-1, 1]
cv_params (dict, default = None): If not defined SymbolicSolver.REGRESSION_CV_PARAMS is used
cv_params = {'n':0, 'cv_params':{}, 'select':'mean', 'opt_params':{'method': 'Nelder-Mead'}, 'opt_metric':make_scorer(mean_squared_error, greater_is_better=False)}
- 'n' : (int) Crossvalidate n top models
- 'cv_params' : (dict) Parameters passed to scikit-learn cross_validate method
- select : (str) Best model selection method choose from 'mean'or 'median'
- opt_params : (dict) Parameters passed to scipy.optimize.minimize method
- opt_metric : (make_scorer) Scoring method
- warm_start (bool, default=False): If True, then the solver will be reused for the next call of fit.
133 def __init__( 134 self, 135 num_threads: int = 1, 136 time_limit: float = 5.0, 137 iter_limit: int = 0, 138 precision: str = "f32", 139 problem="math", 140 feature_probs="xicor", 141 random_state: int = 0, 142 verbose: int = 0, 143 metric: str = "MSE", 144 transformation: str = None, 145 algo_settings=None, 146 code_settings=None, 147 population_settings=None, 148 init_const_settings=None, 149 const_settings=None, 150 target_clip: Iterable = None, 151 cv_params=None, 152 warm_start: bool = False, 153 ): 154 super(SymbolicRegressor, self).__init__( 155 num_threads=num_threads, 156 time_limit=time_limit, 157 iter_limit=iter_limit, 158 precision=precision, 159 problem=problem, 160 feature_probs=feature_probs, 161 random_state=random_state, 162 verbose=verbose, 163 metric=metric, 164 transformation=transformation, 165 algo_settings=algo_settings, 166 code_settings=code_settings, 167 population_settings=population_settings, 168 init_const_settings=init_const_settings, 169 const_settings=const_settings, 170 target_clip=target_clip, 171 class_weight=None, 172 cv_params=cv_params, 173 warm_start=warm_start, 174 )
176 def fit(self, X, y, sample_weight=None, check_input=True): 177 """ 178 Fit the symbolic models according to the given training data. 179 180 Parameters 181 ---------- 182 X : array-like of shape (n_samples, n_features) 183 Training vector, where `n_samples` is the number of samples and 184 `n_features` is the number of features. 185 186 y : array-like of shape (n_samples,) 187 Target vector relative to X. 188 189 sample_weight : array-like of shape (n_samples,) default=None 190 Array of weights that are assigned to individual samples. 191 If not provided, then each sample is given unit weight. 192 193 check_input : bool, default=True 194 Allow to bypass several input checking. 195 Don't use this parameter unless you know what you're doing. 196 197 Returns 198 ------- 199 self 200 Fitted estimator. 201 """ 202 203 super(SymbolicRegressor, self).fit(X, y, sample_weight=sample_weight, check_input=check_input) 204 return self
Fit the symbolic models according to the given training data.
Parameters
- X (array-like of shape (n_samples, n_features)):
Training vector, where
n_samples
is the number of samples andn_features
is the number of features. - y (array-like of shape (n_samples,)): Target vector relative to X.
- sample_weight (array-like of shape (n_samples,) default=None): Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.
- check_input (bool, default=True): Allow to bypass several input checking. Don't use this parameter unless you know what you're doing.
Returns
- self: Fitted estimator.
206 def predict(self, X, id=None, check_input=True, use_parsed_model=True): 207 """ 208 Predict regression target for X. 209 210 Parameters 211 ---------- 212 X : array-like of shape (n_samples, n_features) 213 The input samples. 214 215 id : int 216 Model id, default=None. id can be obtained from get_models method. If its None prediction use the best model. 217 218 check_input : bool, default=True 219 Allow to bypass several input checking. 220 Don't use this parameter unless you know what you're doing. 221 222 Returns 223 ------- 224 y : ndarray of shape (n_samples,) or (n_samples, n_outputs) 225 The predicted values. 226 """ 227 return super(SymbolicRegressor, self).predict(X, id=id, check_input=check_input, use_parsed_model=use_parsed_model)
Predict regression target for X.
Parameters
- X (array-like of shape (n_samples, n_features)): The input samples.
- id (int): Model id, default=None. id can be obtained from get_models method. If its None prediction use the best model.
- check_input (bool, default=True): Allow to bypass several input checking. Don't use this parameter unless you know what you're doing.
Returns
- y (ndarray of shape (n_samples,) or (n_samples, n_outputs)): The predicted values.
Descriptor for defining set_{method}_request
methods in estimators.
New in version 1.3.
Parameters
- name (str):
The name of the method for which the request function should be
created, e.g.
"fit"
would create aset_fit_request
function. - keys (list of str):
A list of strings which are accepted parameters by the created
function, e.g.
["sample_weight"]
if the corresponding method accepts it as a metadata. - validate_keys (bool, default=True): Whether to check if the requested parameters fit the actual parameters of the method.
Notes
This class is a descriptor 1 and uses PEP-362 to set the signature of the returned function 2.
References
Descriptor for defining set_{method}_request
methods in estimators.
New in version 1.3.
Parameters
- name (str):
The name of the method for which the request function should be
created, e.g.
"fit"
would create aset_fit_request
function. - keys (list of str):
A list of strings which are accepted parameters by the created
function, e.g.
["sample_weight"]
if the corresponding method accepts it as a metadata. - validate_keys (bool, default=True): Whether to check if the requested parameters fit the actual parameters of the method.
Notes
This class is a descriptor 1 and uses PEP-362 to set the signature of the returned function 2.
References
Descriptor for defining set_{method}_request
methods in estimators.
New in version 1.3.
Parameters
- name (str):
The name of the method for which the request function should be
created, e.g.
"fit"
would create aset_fit_request
function. - keys (list of str):
A list of strings which are accepted parameters by the created
function, e.g.
["sample_weight"]
if the corresponding method accepts it as a metadata. - validate_keys (bool, default=True): Whether to check if the requested parameters fit the actual parameters of the method.
Notes
This class is a descriptor 1 and uses PEP-362 to set the signature of the returned function 2.
References
12class NonlinearLogisticRegressor(ClassifierMixin, SymbolicSolver): 13 """ 14 Nonlinear Logistic Regressor 15 16 Parameters 17 ---------- 18 num_threads : int, default=1 19 Number of used threads. 20 21 time_limit : float, default=5.0 22 Timeout in seconds. If is set to 0 there is no limit and the algorithm runs until iter_limit is met. 23 24 iter_limit : int, default=0 25 Iterations limit. If is set to 0 there is no limit and the algorithm runs until time_limit is met. 26 27 precision : str, default='f32' 28 'f64' or 'f32'. Internal floating number representation. 29 30 problem : str or dict, default='math' 31 Predefined instructions sets 'math' or 'simple' or 'fuzzy' or custom defines set of instructions with mutation probability. 32 ```python 33 problem={'add':10.0, 'mul':10.0, 'gt':1.0, 'lt':1.0, 'nop':1.0} 34 ``` 35 36 |**supported instructions**|| 37 |-|-| 38 |**math**|add, sub, mul, div, pdiv, inv, minv, sq2, pow, exp, log, sqrt, cbrt, aq| 39 |**goniometric**|sin, cos, tan, asin, acos, atan, sinh, cosh, tanh| 40 |**other**|nop, max, min, abs, floor, ceil, lt, gt, lte, gte| 41 |**fuzzy**|f_and, f_or, f_xor, f_impl, f_not, f_nand, f_nor, f_nxor, f_nimpl| 42 43 *nop - no operation* 44 45 *pdiv - protected division* 46 47 *inv - inverse* $(-x)$ 48 49 *minv - multiplicative inverse* $(1/x)$ 50 51 *lt, gt, lte, gte -* $<, >, <=, >=$ 52 53 feature_probs : str or array of shape (n_features,), default='xicor' 54 The probability that a mutation will select a feature. 55 If None then the features are selected with equal probability. 56 If 'xicor' then the probabilities are deriveded from xicor corelation coefficient https://doi.org/10.1080/01621459.2020.1758115 57 58 random_state : int, default=0 59 Random generator seed. If 0 then random generator will be initialized by system time. 60 61 verbose : int, default=0 62 Controls the verbosity when fitting and predicting. 63 64 metric : str, default='LogLoss' 65 Metric used for evaluating error. Choose from {'MSE', 'MAE', 'MSLE', 'LogLoss'} 66 67 transformation : str, default='LOGISTIC' 68 Final transformation for computed value. Choose from { None, 'LOGISTIC', 'ORDINAL'} 69 70 algo_settings : dict, default = None 71 If not defined SymbolicSolver.ALGO_SETTINGS is used. 72 ```python 73 algo_settings = {'neighbours_count':15, 'alpha':0.15, 'beta':0.5, 'pretest_size':1, 'sample_size':16} 74 ``` 75 - 'neighbours_count' : (int) Number tested neighbours in each iteration 76 - 'alpha' : (float) Score worsening limit for a iteration 77 - 'beta' : (float) Tree breadth-wise expanding factor in a range from 0 to 1 78 - 'pretest_size' : (int) Batch count(batch is 64 rows sample) for fast fitness preevaluating 79 - 'sample_size : (int) Number of batches of sample used to calculate the score during training 80 81 code_settings : dict, default = None 82 If not defined SymbolicSolver.CODE_SETTINGS is used. 83 ```python 84 code_settings = {'min_size': 32, 'max_size':32, 'const_size':8} 85 ``` 86 - 'const_size' : (int) Maximum alloved constants in symbolic model, accept also 0. 87 - 'min_size': (int) Minimum allowed equation size(as a linear program). 88 - 'max_size' : (int) Maximum allowed equation size(as a linear program). 89 90 population_settings : dict, default = None 91 If not defined SymbolicSolver.POPULATION_SETTINGS is used. 92 ```python 93 population_settings = {'size': 64, 'tournament':4} 94 ``` 95 - 'size' : (int) Number of individuals in the population. 96 - 'tournament' : (int) Tournament selection. 97 98 init_const_settings : dict, default = None 99 If not defined SymbolicSolver.INIT_CONST_SETTINGS is used. 100 ```python 101 init_const_settings = {'const_min':-1.0, 'const_max':1.0, 'predefined_const_prob':0.0, 'predefined_const_set': []} 102 ``` 103 - 'const_min' : (float) Lower range for initializing constants. 104 - 'const_max' : (float) Upper range for initializing constants. 105 - 'predefined_const_prob': (float) Probability of selecting one of the predefined constants during initialization. 106 - 'predefined_const_set' : (array of floats) Predefined constants used during initialization. 107 108 const_settings : dict, default = None 109 If not defined SymbolicSolver.CONST_SETTINGS is used. 110 ```python 111 const_settings = {'const_min':-LARGE_FLOAT, 'const_max':LARGE_FLOAT, 'predefined_const_prob':0.0, 'predefined_const_set': []} 112 ``` 113 - 'const_min' : (float) Lower range for constants used in equations. 114 - 'const_max' : (float) Upper range for constants used in equations. 115 - 'predefined_const_prob': (float) Probability of selecting one of the predefined constants during search process(mutation). 116 - 'predefined_const_set' : (array of floats) Predefined constants used during search process(mutation). 117 118 target_clip : array, default = None 119 Array of two float values clip_min and clip_max. 120 If not defined SymbolicSolver.CLASSIFICATION_TARGET_CLIP is used. 121 ```python 122 target_clip=[3e-7, 1.0-3e-7] 123 ``` 124 class_weight : dict or 'balanced', default=None 125 Weights associated with classes in the form ``{class_label: weight}``. 126 If not given, all classes are supposed to have weight one. 127 128 The "balanced" mode uses the values of y to automatically adjust 129 weights inversely proportional to class frequencies in the input data 130 as ``n_samples / (n_classes * np.bincount(y))``. 131 132 Note that these weights will be multiplied with sample_weight (passed 133 through the fit method) if sample_weight is specified. 134 135 cv_params : dict, default = None 136 If not defined SymbolicSolver.CLASSIFICATION_CV_PARAMS is used. 137 ```python 138 cv_params = {'n':0, 'cv_params':{}, 'select':'mean', 'opt_params':{'method': 'Nelder-Mead'}, 'opt_metric':make_scorer(log_loss, greater_is_better=False, response_method='predict_proba')} 139 ``` 140 - 'n' : (int) Crossvalidate n top models 141 - 'cv_params' : (dict) Parameters passed to scikit-learn cross_validate method 142 - select : (str) Best model selection method choose from 'mean'or 'median' 143 - opt_params : (dict) Parameters passed to scipy.optimize.minimize method 144 - opt_metric : (make_scorer) Scoring method 145 146 warm_start : bool, default=False 147 If True, then the solver will be reused for the next call of fit. 148 """ 149 150 def __init__( 151 self, 152 num_threads: int = 1, 153 time_limit: float = 5.0, 154 iter_limit: int = 0, 155 precision: str = "f32", 156 problem="math", 157 feature_probs="xicor", 158 random_state: int = 0, 159 verbose: int = 0, 160 metric: str = "LogLoss", 161 transformation: str = "LOGISTIC", 162 algo_settings=None, 163 code_settings=None, 164 population_settings=None, 165 init_const_settings=None, 166 const_settings=None, 167 target_clip=None, 168 class_weight=None, 169 cv_params=None, 170 warm_start: bool = False, 171 ): 172 173 super(NonlinearLogisticRegressor, self).__init__( 174 num_threads=num_threads, 175 time_limit=time_limit, 176 iter_limit=iter_limit, 177 precision=precision, 178 problem=problem, 179 feature_probs=feature_probs, 180 random_state=random_state, 181 verbose=verbose, 182 metric=metric, 183 transformation=transformation, 184 algo_settings=algo_settings, 185 code_settings=code_settings, 186 population_settings=population_settings, 187 init_const_settings=init_const_settings, 188 const_settings=const_settings, 189 target_clip=target_clip, 190 class_weight=class_weight, 191 cv_params=cv_params, 192 warm_start=warm_start, 193 ) 194 195 def fit(self, X, y, sample_weight=None, check_input=True): 196 """ 197 Fit the symbolic models according to the given training data. 198 199 Parameters 200 ---------- 201 X : array-like of shape (n_samples, n_features) 202 Training vector, where `n_samples` is the number of samples and 203 `n_features` is the number of features. 204 205 y : array-like of shape (n_samples,) 206 Target vector relative to X. Needs samples of 2 classes. 207 208 sample_weight : array-like of shape (n_samples,) default=None 209 Array of weights that are assigned to individual samples. 210 If not provided, then each sample is given unit weight. 211 212 check_input : bool, default=True 213 Allow to bypass several input checking. 214 Don't use this parameter unless you know what you're doing. 215 216 Returns 217 ------- 218 self 219 Fitted estimator. 220 """ 221 222 if check_input: 223 X, y = validate_data(self, X, y, accept_sparse=False, y_numeric=False, multi_output=False) 224 225 y_type = type_of_target(y, input_name="y", raise_unknown=True) 226 if y_type != "binary": 227 raise ValueError("Only binary classification is supported. The type of the target " f"is {y_type}.") 228 check_classification_targets(y) 229 enc = LabelEncoder() 230 y_ind = enc.fit_transform(y) 231 self.classes_ = enc.classes_ 232 self.n_classes_ = len(self.classes_) 233 if self.n_classes_ != 2: 234 raise ValueError("This solver needs samples of 2 classes" " in the data, but the data contains" " %r classes" % self.n_classes_) 235 236 self.class_weight_ = compute_class_weight(self.class_weight, classes=self.classes_, y=y) 237 238 super(NonlinearLogisticRegressor, self).fit(X, y_ind, sample_weight=sample_weight, check_input=False) 239 return self 240 241 def predict(self, X, id=None, check_input=True, use_parsed_model=True): 242 """ 243 Predict class for X. 244 245 Parameters 246 ---------- 247 X : array-like of shape (n_samples, n_features) 248 The input samples. 249 250 id : int 251 Model id, default=None. id can be obtained from get_models method. If its None prediction use the best model. 252 253 check_input : bool, default=True 254 Allow to bypass several input checking. 255 Don't use this parameter unless you know what you're doing. 256 257 Returns 258 ------- 259 y : ndarray of shape (n_samples,) 260 The predicted classes. 261 """ 262 preds = super(NonlinearLogisticRegressor, self).predict(X, id, check_input=check_input, use_parsed_model=use_parsed_model) 263 return self.classes_[(preds > 0.5).astype(int)] 264 265 def predict_proba(self, X, id=None, check_input=True): 266 """ 267 Predict class probabilities for X. 268 269 Parameters 270 ---------- 271 X : array-like of shape (n_samples, n_features) 272 273 check_input : bool, default=True 274 Allow to bypass several input checking. 275 Don't use this parameter unless you know what you're doing. 276 277 Returns 278 ------- 279 T : ndarray of shape (n_samples, n_classes) 280 The class probabilities of the input samples. The order of the 281 classes corresponds to that in the attribute :term:`classes_`. 282 """ 283 preds = super(NonlinearLogisticRegressor, self).predict(X, id, check_input=check_input) 284 proba = numpy.vstack([1 - preds, preds]).T 285 return proba 286 287 def __sklearn_tags__(self): 288 tags = super().__sklearn_tags__() 289 tags.classifier_tags = ClassifierTags(multi_class=False) 290 return tags
Nonlinear Logistic Regressor
Parameters
- num_threads (int, default=1): Number of used threads.
- time_limit (float, default=5.0): Timeout in seconds. If is set to 0 there is no limit and the algorithm runs until iter_limit is met.
- iter_limit (int, default=0): Iterations limit. If is set to 0 there is no limit and the algorithm runs until time_limit is met.
- precision (str, default='f32'): 'f64' or 'f32'. Internal floating number representation.
problem (str or dict, default='math'): Predefined instructions sets 'math' or 'simple' or 'fuzzy' or custom defines set of instructions with mutation probability.
problem={'add':10.0, 'mul':10.0, 'gt':1.0, 'lt':1.0, 'nop':1.0}
supported instructions math add, sub, mul, div, pdiv, inv, minv, sq2, pow, exp, log, sqrt, cbrt, aq goniometric sin, cos, tan, asin, acos, atan, sinh, cosh, tanh other nop, max, min, abs, floor, ceil, lt, gt, lte, gte fuzzy f_and, f_or, f_xor, f_impl, f_not, f_nand, f_nor, f_nxor, f_nimpl nop - no operation
pdiv - protected division
inv - inverse $(-x)$
minv - multiplicative inverse $(1/x)$
lt, gt, lte, gte - $<, >, <=, >=$
- feature_probs (str or array of shape (n_features,), default='xicor'): The probability that a mutation will select a feature. If None then the features are selected with equal probability. If 'xicor' then the probabilities are deriveded from xicor corelation coefficient https://doi.org/10.1080/01621459.2020.1758115
- random_state (int, default=0): Random generator seed. If 0 then random generator will be initialized by system time.
- verbose (int, default=0): Controls the verbosity when fitting and predicting.
- metric (str, default='LogLoss'): Metric used for evaluating error. Choose from {'MSE', 'MAE', 'MSLE', 'LogLoss'}
- transformation (str, default='LOGISTIC'): Final transformation for computed value. Choose from { None, 'LOGISTIC', 'ORDINAL'}
algo_settings (dict, default = None): If not defined SymbolicSolver.ALGO_SETTINGS is used.
algo_settings = {'neighbours_count':15, 'alpha':0.15, 'beta':0.5, 'pretest_size':1, 'sample_size':16}
- 'neighbours_count' : (int) Number tested neighbours in each iteration
- 'alpha' : (float) Score worsening limit for a iteration
- 'beta' : (float) Tree breadth-wise expanding factor in a range from 0 to 1
- 'pretest_size' : (int) Batch count(batch is 64 rows sample) for fast fitness preevaluating
- 'sample_size : (int) Number of batches of sample used to calculate the score during training
code_settings (dict, default = None): If not defined SymbolicSolver.CODE_SETTINGS is used.
code_settings = {'min_size': 32, 'max_size':32, 'const_size':8}
- 'const_size' : (int) Maximum alloved constants in symbolic model, accept also 0.
- 'min_size': (int) Minimum allowed equation size(as a linear program).
- 'max_size' : (int) Maximum allowed equation size(as a linear program).
population_settings (dict, default = None): If not defined SymbolicSolver.POPULATION_SETTINGS is used.
population_settings = {'size': 64, 'tournament':4}
- 'size' : (int) Number of individuals in the population.
- 'tournament' : (int) Tournament selection.
init_const_settings (dict, default = None): If not defined SymbolicSolver.INIT_CONST_SETTINGS is used.
init_const_settings = {'const_min':-1.0, 'const_max':1.0, 'predefined_const_prob':0.0, 'predefined_const_set': []}
- 'const_min' : (float) Lower range for initializing constants.
- 'const_max' : (float) Upper range for initializing constants.
- 'predefined_const_prob': (float) Probability of selecting one of the predefined constants during initialization.
- 'predefined_const_set' : (array of floats) Predefined constants used during initialization.
const_settings (dict, default = None): If not defined SymbolicSolver.CONST_SETTINGS is used.
const_settings = {'const_min':-LARGE_FLOAT, 'const_max':LARGE_FLOAT, 'predefined_const_prob':0.0, 'predefined_const_set': []}
- 'const_min' : (float) Lower range for constants used in equations.
- 'const_max' : (float) Upper range for constants used in equations.
- 'predefined_const_prob': (float) Probability of selecting one of the predefined constants during search process(mutation).
- 'predefined_const_set' : (array of floats) Predefined constants used during search process(mutation).
target_clip (array, default = None): Array of two float values clip_min and clip_max. If not defined SymbolicSolver.CLASSIFICATION_TARGET_CLIP is used.
target_clip=[3e-7, 1.0-3e-7]
class_weight (dict or 'balanced', default=None): Weights associated with classes in the form
{class_label: weight}
. If not given, all classes are supposed to have weight one.The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as
n_samples / (n_classes * np.bincount(y))
.Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.
cv_params (dict, default = None): If not defined SymbolicSolver.CLASSIFICATION_CV_PARAMS is used.
cv_params = {'n':0, 'cv_params':{}, 'select':'mean', 'opt_params':{'method': 'Nelder-Mead'}, 'opt_metric':make_scorer(log_loss, greater_is_better=False, response_method='predict_proba')}
- 'n' : (int) Crossvalidate n top models
- 'cv_params' : (dict) Parameters passed to scikit-learn cross_validate method
- select : (str) Best model selection method choose from 'mean'or 'median'
- opt_params : (dict) Parameters passed to scipy.optimize.minimize method
- opt_metric : (make_scorer) Scoring method
- warm_start (bool, default=False): If True, then the solver will be reused for the next call of fit.
150 def __init__( 151 self, 152 num_threads: int = 1, 153 time_limit: float = 5.0, 154 iter_limit: int = 0, 155 precision: str = "f32", 156 problem="math", 157 feature_probs="xicor", 158 random_state: int = 0, 159 verbose: int = 0, 160 metric: str = "LogLoss", 161 transformation: str = "LOGISTIC", 162 algo_settings=None, 163 code_settings=None, 164 population_settings=None, 165 init_const_settings=None, 166 const_settings=None, 167 target_clip=None, 168 class_weight=None, 169 cv_params=None, 170 warm_start: bool = False, 171 ): 172 173 super(NonlinearLogisticRegressor, self).__init__( 174 num_threads=num_threads, 175 time_limit=time_limit, 176 iter_limit=iter_limit, 177 precision=precision, 178 problem=problem, 179 feature_probs=feature_probs, 180 random_state=random_state, 181 verbose=verbose, 182 metric=metric, 183 transformation=transformation, 184 algo_settings=algo_settings, 185 code_settings=code_settings, 186 population_settings=population_settings, 187 init_const_settings=init_const_settings, 188 const_settings=const_settings, 189 target_clip=target_clip, 190 class_weight=class_weight, 191 cv_params=cv_params, 192 warm_start=warm_start, 193 )
195 def fit(self, X, y, sample_weight=None, check_input=True): 196 """ 197 Fit the symbolic models according to the given training data. 198 199 Parameters 200 ---------- 201 X : array-like of shape (n_samples, n_features) 202 Training vector, where `n_samples` is the number of samples and 203 `n_features` is the number of features. 204 205 y : array-like of shape (n_samples,) 206 Target vector relative to X. Needs samples of 2 classes. 207 208 sample_weight : array-like of shape (n_samples,) default=None 209 Array of weights that are assigned to individual samples. 210 If not provided, then each sample is given unit weight. 211 212 check_input : bool, default=True 213 Allow to bypass several input checking. 214 Don't use this parameter unless you know what you're doing. 215 216 Returns 217 ------- 218 self 219 Fitted estimator. 220 """ 221 222 if check_input: 223 X, y = validate_data(self, X, y, accept_sparse=False, y_numeric=False, multi_output=False) 224 225 y_type = type_of_target(y, input_name="y", raise_unknown=True) 226 if y_type != "binary": 227 raise ValueError("Only binary classification is supported. The type of the target " f"is {y_type}.") 228 check_classification_targets(y) 229 enc = LabelEncoder() 230 y_ind = enc.fit_transform(y) 231 self.classes_ = enc.classes_ 232 self.n_classes_ = len(self.classes_) 233 if self.n_classes_ != 2: 234 raise ValueError("This solver needs samples of 2 classes" " in the data, but the data contains" " %r classes" % self.n_classes_) 235 236 self.class_weight_ = compute_class_weight(self.class_weight, classes=self.classes_, y=y) 237 238 super(NonlinearLogisticRegressor, self).fit(X, y_ind, sample_weight=sample_weight, check_input=False) 239 return self
Fit the symbolic models according to the given training data.
Parameters
- X (array-like of shape (n_samples, n_features)):
Training vector, where
n_samples
is the number of samples andn_features
is the number of features. - y (array-like of shape (n_samples,)): Target vector relative to X. Needs samples of 2 classes.
- sample_weight (array-like of shape (n_samples,) default=None): Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.
- check_input (bool, default=True): Allow to bypass several input checking. Don't use this parameter unless you know what you're doing.
Returns
- self: Fitted estimator.
241 def predict(self, X, id=None, check_input=True, use_parsed_model=True): 242 """ 243 Predict class for X. 244 245 Parameters 246 ---------- 247 X : array-like of shape (n_samples, n_features) 248 The input samples. 249 250 id : int 251 Model id, default=None. id can be obtained from get_models method. If its None prediction use the best model. 252 253 check_input : bool, default=True 254 Allow to bypass several input checking. 255 Don't use this parameter unless you know what you're doing. 256 257 Returns 258 ------- 259 y : ndarray of shape (n_samples,) 260 The predicted classes. 261 """ 262 preds = super(NonlinearLogisticRegressor, self).predict(X, id, check_input=check_input, use_parsed_model=use_parsed_model) 263 return self.classes_[(preds > 0.5).astype(int)]
Predict class for X.
Parameters
- X (array-like of shape (n_samples, n_features)): The input samples.
- id (int): Model id, default=None. id can be obtained from get_models method. If its None prediction use the best model.
- check_input (bool, default=True): Allow to bypass several input checking. Don't use this parameter unless you know what you're doing.
Returns
- y (ndarray of shape (n_samples,)): The predicted classes.
265 def predict_proba(self, X, id=None, check_input=True): 266 """ 267 Predict class probabilities for X. 268 269 Parameters 270 ---------- 271 X : array-like of shape (n_samples, n_features) 272 273 check_input : bool, default=True 274 Allow to bypass several input checking. 275 Don't use this parameter unless you know what you're doing. 276 277 Returns 278 ------- 279 T : ndarray of shape (n_samples, n_classes) 280 The class probabilities of the input samples. The order of the 281 classes corresponds to that in the attribute :term:`classes_`. 282 """ 283 preds = super(NonlinearLogisticRegressor, self).predict(X, id, check_input=check_input) 284 proba = numpy.vstack([1 - preds, preds]).T 285 return proba
Predict class probabilities for X.
Parameters
X (array-like of shape (n_samples, n_features)):
check_input (bool, default=True): Allow to bypass several input checking. Don't use this parameter unless you know what you're doing.
Returns
- T (ndarray of shape (n_samples, n_classes)):
The class probabilities of the input samples. The order of the
classes corresponds to that in the attribute :term:
classes_
.
Descriptor for defining set_{method}_request
methods in estimators.
New in version 1.3.
Parameters
- name (str):
The name of the method for which the request function should be
created, e.g.
"fit"
would create aset_fit_request
function. - keys (list of str):
A list of strings which are accepted parameters by the created
function, e.g.
["sample_weight"]
if the corresponding method accepts it as a metadata. - validate_keys (bool, default=True): Whether to check if the requested parameters fit the actual parameters of the method.
Notes
This class is a descriptor 1 and uses PEP-362 to set the signature of the returned function 2.
References
Descriptor for defining set_{method}_request
methods in estimators.
New in version 1.3.
Parameters
- name (str):
The name of the method for which the request function should be
created, e.g.
"fit"
would create aset_fit_request
function. - keys (list of str):
A list of strings which are accepted parameters by the created
function, e.g.
["sample_weight"]
if the corresponding method accepts it as a metadata. - validate_keys (bool, default=True): Whether to check if the requested parameters fit the actual parameters of the method.
Notes
This class is a descriptor 1 and uses PEP-362 to set the signature of the returned function 2.
References
Descriptor for defining set_{method}_request
methods in estimators.
New in version 1.3.
Parameters
- name (str):
The name of the method for which the request function should be
created, e.g.
"fit"
would create aset_fit_request
function. - keys (list of str):
A list of strings which are accepted parameters by the created
function, e.g.
["sample_weight"]
if the corresponding method accepts it as a metadata. - validate_keys (bool, default=True): Whether to check if the requested parameters fit the actual parameters of the method.
Notes
This class is a descriptor 1 and uses PEP-362 to set the signature of the returned function 2.
References
Descriptor for defining set_{method}_request
methods in estimators.
New in version 1.3.
Parameters
- name (str):
The name of the method for which the request function should be
created, e.g.
"fit"
would create aset_fit_request
function. - keys (list of str):
A list of strings which are accepted parameters by the created
function, e.g.
["sample_weight"]
if the corresponding method accepts it as a metadata. - validate_keys (bool, default=True): Whether to check if the requested parameters fit the actual parameters of the method.
Notes
This class is a descriptor 1 and uses PEP-362 to set the signature of the returned function 2.
References
293class SymbolicClassifier(OneVsRestClassifier): 294 """ 295 OVR multiclass symbolic classificator 296 297 Parameters 298 ---------- 299 estimator : NonlinearLogisticRegressor 300 Instance of NonlinearLogisticRegressor class. 301 """ 302 303 def __init__(self, estimator: NonlinearLogisticRegressor): 304 super().__init__(estimator=estimator) 305 306 def fit(self, X, y): 307 """ 308 Fit the symbolic models according to the given training data. 309 310 Parameters 311 ---------- 312 X : array-like of shape (n_samples, n_features) 313 Training vector, where `n_samples` is the number of samples and 314 `n_features` is the number of features. Should be in the range [0, 1]. 315 316 y : array-like of shape (n_samples,) 317 Target vector relative to X. 318 319 Returns 320 ------- 321 self 322 Fitted estimator. 323 """ 324 325 super().fit(X, y) 326 return self 327 328 def predict(self, X): 329 """ 330 Predict class for X. 331 332 Parameters 333 ---------- 334 X : array-like of shape (n_samples, n_features) 335 The input samples. 336 337 Returns 338 ------- 339 y : ndarray of shape (n_samples,) 340 The predicted classes. 341 """ 342 return super().predict(X) 343 344 def predict_proba(self, X): 345 """ 346 Predict class probabilities for X. 347 348 Parameters 349 ---------- 350 X : narray-like of shape (n_samples, n_features) 351 352 Returns 353 ------- 354 T : ndarray of shape (n_samples, n_classes) 355 The class probabilities of the input samples. The order of the 356 classes corresponds to that in the attribute :term:`classes_`. 357 """ 358 return super().predict_proba(X) 359 360 def __sklearn_tags__(self): 361 return super().__sklearn_tags__()
OVR multiclass symbolic classificator
Parameters
- estimator (NonlinearLogisticRegressor): Instance of NonlinearLogisticRegressor class.
306 def fit(self, X, y): 307 """ 308 Fit the symbolic models according to the given training data. 309 310 Parameters 311 ---------- 312 X : array-like of shape (n_samples, n_features) 313 Training vector, where `n_samples` is the number of samples and 314 `n_features` is the number of features. Should be in the range [0, 1]. 315 316 y : array-like of shape (n_samples,) 317 Target vector relative to X. 318 319 Returns 320 ------- 321 self 322 Fitted estimator. 323 """ 324 325 super().fit(X, y) 326 return self
Fit the symbolic models according to the given training data.
Parameters
- X (array-like of shape (n_samples, n_features)):
Training vector, where
n_samples
is the number of samples andn_features
is the number of features. Should be in the range [0, 1]. - y (array-like of shape (n_samples,)): Target vector relative to X.
Returns
- self: Fitted estimator.
328 def predict(self, X): 329 """ 330 Predict class for X. 331 332 Parameters 333 ---------- 334 X : array-like of shape (n_samples, n_features) 335 The input samples. 336 337 Returns 338 ------- 339 y : ndarray of shape (n_samples,) 340 The predicted classes. 341 """ 342 return super().predict(X)
Predict class for X.
Parameters
- X (array-like of shape (n_samples, n_features)): The input samples.
Returns
- y (ndarray of shape (n_samples,)): The predicted classes.
344 def predict_proba(self, X): 345 """ 346 Predict class probabilities for X. 347 348 Parameters 349 ---------- 350 X : narray-like of shape (n_samples, n_features) 351 352 Returns 353 ------- 354 T : ndarray of shape (n_samples, n_classes) 355 The class probabilities of the input samples. The order of the 356 classes corresponds to that in the attribute :term:`classes_`. 357 """ 358 return super().predict_proba(X)
Predict class probabilities for X.
Parameters
- X (narray-like of shape (n_samples, n_features)):
Returns
- T (ndarray of shape (n_samples, n_classes)):
The class probabilities of the input samples. The order of the
classes corresponds to that in the attribute :term:
classes_
.
Descriptor for defining set_{method}_request
methods in estimators.
New in version 1.3.
Parameters
- name (str):
The name of the method for which the request function should be
created, e.g.
"fit"
would create aset_fit_request
function. - keys (list of str):
A list of strings which are accepted parameters by the created
function, e.g.
["sample_weight"]
if the corresponding method accepts it as a metadata. - validate_keys (bool, default=True): Whether to check if the requested parameters fit the actual parameters of the method.
Notes
This class is a descriptor 1 and uses PEP-362 to set the signature of the returned function 2.
References
Descriptor for defining set_{method}_request
methods in estimators.
New in version 1.3.
Parameters
- name (str):
The name of the method for which the request function should be
created, e.g.
"fit"
would create aset_fit_request
function. - keys (list of str):
A list of strings which are accepted parameters by the created
function, e.g.
["sample_weight"]
if the corresponding method accepts it as a metadata. - validate_keys (bool, default=True): Whether to check if the requested parameters fit the actual parameters of the method.
Notes
This class is a descriptor 1 and uses PEP-362 to set the signature of the returned function 2.
References
12class FuzzyRegressor(ClassifierMixin, SymbolicSolver): 13 """ 14 Fuzzy Regressor 15 16 Parameters 17 ---------- 18 num_threads : int, default=1 19 Number of used threads. 20 21 time_limit : float, default=5.0 22 Timeout in seconds. If is set to 0 there is no limit and the algorithm runs until iter_limit is met. 23 24 iter_limit : int, default=0 25 Iterations limit. If is set to 0 there is no limit and the algorithm runs until time_limit is met. 26 27 precision : str, default='f32' 28 'f64' or 'f32'. Internal floating number representation. 29 30 problem : str or dict, default='fuzzy' 31 Predefined instructions sets 'fuzzy' or custom defines set of instructions with mutation probability. 32 ```python 33 problem={'f_and':10.0, 'f_or':10.0, 'f_xor':1.0, 'f_not':1.0, 'nop':1.0} 34 ``` 35 36 |**supported instructions**|| 37 |-|-| 38 |**other**|nop| 39 |**fuzzy**|f_and, f_or, f_xor, f_impl, f_not, f_nand, f_nor, f_nxor, f_nimpl| 40 41 feature_probs : str or array of shape (n_features,), default='xicor' 42 The probability that a mutation will select a feature. 43 If None then the features are selected with equal probability. 44 If 'xicor' then the probabilities are deriveded from xicor corelation coefficient https://doi.org/10.1080/01621459.2020.1758115 45 46 random_state : int, default=0 47 Random generator seed. If 0 then random generator will be initialized by system time. 48 49 verbose : int, default=0 50 Controls the verbosity when fitting and predicting. 51 52 metric : str, default='LogLoss' 53 Metric used for evaluating error. Choose from {'MSE', 'MAE', 'MSLE', 'LogLoss'} 54 55 transformation : str, default=None 56 Final transformation for computed value. Choose from { None, 'LOGISTIC', 'ORDINAL'} 57 58 algo_settings : dict, default = None 59 If not defined SymbolicSolver.ALGO_SETTINGS is used. 60 ```python 61 algo_settings = {'neighbours_count':15, 'alpha':0.15, 'beta':0.5, 'pretest_size':1, 'sample_size':16} 62 ``` 63 - 'neighbours_count' : (int) Number tested neighbours in each iteration 64 - 'alpha' : (float) Score worsening limit for a iteration 65 - 'beta' : (float) Tree breadth-wise expanding factor in a range from 0 to 1 66 - 'pretest_size' : (int) Batch count(batch is 64 rows sample) for fast fitness preevaluating 67 - 'sample_size : (int) Number of batches of sample used to calculate the score during training 68 69 code_settings : dict, default = None 70 If not defined SymbolicSolver.CODE_SETTINGS is used. 71 ```python 72 code_settings = {'min_size': 32, 'max_size':32, 'const_size':8} 73 ``` 74 - 'const_size' : (int) Maximum alloved constants in symbolic model, accept also 0. 75 - 'min_size': (int) Minimum allowed equation size(as a linear program). 76 - 'max_size' : (int) Maximum allowed equation size(as a linear program). 77 78 population_settings : dict, default = None 79 If not defined SymbolicSolver.POPULATION_SETTINGS is used. 80 ```python 81 population_settings = {'size': 64, 'tournament':4} 82 ``` 83 - 'size' : (int) Number of individuals in the population. 84 - 'tournament' : (int) Tournament selection. 85 86 init_const_settings : dict, default = None 87 If not defined FuzzyRegressor.INIT_CONST_SETTINGS is used. 88 ```python 89 init_const_settings = {'const_min':0.0, 'const_max':1.0, 'predefined_const_prob':0.0, 'predefined_const_set': []} 90 ``` 91 - 'const_min' : (float) Lower range for initializing constants. 92 - 'const_max' : (float) Upper range for initializing constants. 93 - 'predefined_const_prob': (float) Probability of selecting one of the predefined constants during initialization. 94 - 'predefined_const_set' : (array of floats) Predefined constants used during initialization. 95 96 const_settings : dict, default = None 97 If not defined FuzzyRegressor.CONST_SETTINGS is used. 98 ```python 99 const_settings = {'const_min':0.0, 'const_max':1.0, 'predefined_const_prob':0.0, 'predefined_const_set': []} 100 ``` 101 - 'const_min' : (float) Lower range for constants used in equations. 102 - 'const_max' : (float) Upper range for constants used in equations. 103 - 'predefined_const_prob': (float) Probability of selecting one of the predefined constants during search process(mutation). 104 - 'predefined_const_set' : (array of floats) Predefined constants used during search process(mutation). 105 106 target_clip : array, default = None 107 Array of two float values clip_min and clip_max. 108 If not defined SymbolicSolver.CLASSIFICATION_TARGET_CLIP is used. 109 ```python 110 target_clip=[3e-7, 1.0-3e-7] 111 ``` 112 class_weight : dict or 'balanced', default=None 113 Weights associated with classes in the form ``{class_label: weight}``. 114 If not given, all classes are supposed to have weight one. 115 116 The "balanced" mode uses the values of y to automatically adjust 117 weights inversely proportional to class frequencies in the input data 118 as ``n_samples / (n_classes * np.bincount(y))``. 119 120 Note that these weights will be multiplied with sample_weight (passed 121 through the fit method) if sample_weight is specified. 122 123 cv_params : dict, default = None 124 If not defined SymbolicSolver.CLASSIFICATION_CV_PARAMS is used. 125 ```python 126 cv_params = {'n':0, 'cv_params':{}, 'select':'mean', 'opt_params':{'method': 'Nelder-Mead'}, 'opt_metric':make_scorer(log_loss, greater_is_better=False, response_method='predict_proba')} 127 ``` 128 - 'n' : (int) Crossvalidate n top models 129 - 'cv_params' : (dict) Parameters passed to scikit-learn cross_validate method 130 - select : (str) Best model selection method choose from 'mean'or 'median' 131 - opt_params : (dict) Parameters passed to scipy.optimize.minimize method 132 - opt_metric : (make_scorer) Scoring method 133 134 warm_start : bool, default=False 135 If True, then the solver will be reused for the next call of fit. 136 """ 137 138 INIT_CONST_SETTINGS = { 139 "const_min": 0.0, 140 "const_max": 1.0, 141 "predefined_const_prob": 0.0, 142 "predefined_const_set": [], 143 } 144 CONST_SETTINGS = { 145 "const_min": 0.0, 146 "const_max": 1.0, 147 "predefined_const_prob": 0.0, 148 "predefined_const_set": [], 149 } 150 151 def __init__( 152 self, 153 num_threads: int = 1, 154 time_limit: float = 5.0, 155 iter_limit: int = 0, 156 precision: str = "f32", 157 problem="fuzzy", 158 feature_probs="xicor", 159 random_state: int = 0, 160 verbose: int = 0, 161 metric: str = "LogLoss", 162 transformation: str = None, 163 algo_settings=None, 164 code_settings=None, 165 population_settings=None, 166 init_const_settings=None, 167 const_settings=None, 168 target_clip=None, 169 class_weight=None, 170 cv_params=None, 171 warm_start: bool = False, 172 ): 173 super(FuzzyRegressor, self).__init__( 174 num_threads=num_threads, 175 time_limit=time_limit, 176 iter_limit=iter_limit, 177 precision=precision, 178 problem=problem, 179 feature_probs=feature_probs, 180 random_state=random_state, 181 verbose=verbose, 182 metric=metric, 183 algo_settings=algo_settings, 184 transformation=transformation, 185 code_settings=code_settings, 186 population_settings=population_settings, 187 init_const_settings=init_const_settings, 188 const_settings=const_settings, 189 target_clip=target_clip, 190 class_weight=class_weight, 191 cv_params=cv_params, 192 warm_start=warm_start, 193 ) 194 195 def fit(self, X, y, sample_weight=None, check_input=True): 196 """ 197 Fit the symbolic models according to the given training data. 198 199 Parameters 200 ---------- 201 X : array-like of shape (n_samples, n_features) 202 Training vector, where `n_samples` is the number of samples and 203 `n_features` is the number of features. Should be in the range [0, 1]. 204 205 y : array-like of shape (n_samples,) 206 Target vector relative to X. Needs samples of 2 classes. 207 208 sample_weight : array-like of shape (n_samples,) default=None 209 Array of weights that are assigned to individual samples. 210 If not provided, then each sample is given unit weight. 211 212 check_input : bool, default=True 213 Allow to bypass several input checking. 214 Don't use this parameter unless you know what you're doing. 215 216 Returns 217 ------- 218 self 219 Fitted estimator. 220 """ 221 if check_input: 222 X, y = validate_data(self, X, y, accept_sparse=False, y_numeric=False, multi_output=False) 223 224 y_type = type_of_target(y, input_name="y", raise_unknown=True) 225 if y_type != "binary": 226 raise ValueError("Only binary classification is supported. The type of the target " f"is {y_type}.") 227 check_classification_targets(y) 228 enc = LabelEncoder() 229 y_ind = enc.fit_transform(y) 230 self.classes_ = enc.classes_ 231 self.n_classes_ = len(self.classes_) 232 if self.n_classes_ != 2: 233 raise ValueError("This solver needs samples of 2 classes" " in the data, but the data contains" " %r classes" % self.n_classes_) 234 235 self.class_weight_ = compute_class_weight(self.class_weight, classes=self.classes_, y=y) 236 237 super(FuzzyRegressor, self).fit(X, y_ind, sample_weight=sample_weight, check_input=check_input) 238 return self 239 240 def predict(self, X, id=None, check_input=True, use_parsed_model=True): 241 """ 242 Predict class for X. 243 244 Parameters 245 ---------- 246 X : array-like of shape (n_samples, n_features) 247 The input samples. 248 249 id : int 250 Model id, default=None. id can be obtained from get_models method. If its None prediction use the best model. 251 252 check_input : bool, default=True 253 Allow to bypass several input checking. 254 Don't use this parameter unless you know what you're doing. 255 256 Returns 257 ------- 258 y : ndarray of shape (n_samples,) 259 The predicted classes. 260 """ 261 preds = super(FuzzyRegressor, self).predict(X, id, check_input=check_input, use_parsed_model=use_parsed_model) 262 return self.classes_[(preds > 0.5).astype(int)] 263 264 def predict_proba(self, X, id=None, check_input=True): 265 """ 266 Predict class probabilities for X. 267 268 Parameters 269 ---------- 270 X : array-like of shape (n_samples, n_features) 271 272 check_input : bool, default=True 273 Allow to bypass several input checking. 274 Don't use this parameter unless you know what you're doing. 275 276 Returns 277 ------- 278 T : ndarray of shape (n_samples, n_classes) 279 The class probabilities of the input samples. The order of the 280 classes corresponds to that in the attribute :term:`classes_`. 281 """ 282 preds = super(FuzzyRegressor, self).predict(X, id, check_input=check_input) 283 proba = numpy.vstack([1 - preds, preds]).T 284 return proba 285 286 def __sklearn_tags__(self): 287 tags = super().__sklearn_tags__() 288 tags.estimator_type = "classifier" 289 tags.classifier_tags = ClassifierTags(multi_class=False, poor_score=True) 290 return tags
Fuzzy Regressor
Parameters
- num_threads (int, default=1): Number of used threads.
- time_limit (float, default=5.0): Timeout in seconds. If is set to 0 there is no limit and the algorithm runs until iter_limit is met.
- iter_limit (int, default=0): Iterations limit. If is set to 0 there is no limit and the algorithm runs until time_limit is met.
- precision (str, default='f32'): 'f64' or 'f32'. Internal floating number representation.
problem (str or dict, default='fuzzy'): Predefined instructions sets 'fuzzy' or custom defines set of instructions with mutation probability.
problem={'f_and':10.0, 'f_or':10.0, 'f_xor':1.0, 'f_not':1.0, 'nop':1.0}
supported instructions other nop fuzzy f_and, f_or, f_xor, f_impl, f_not, f_nand, f_nor, f_nxor, f_nimpl - feature_probs (str or array of shape (n_features,), default='xicor'): The probability that a mutation will select a feature. If None then the features are selected with equal probability. If 'xicor' then the probabilities are deriveded from xicor corelation coefficient https://doi.org/10.1080/01621459.2020.1758115
- random_state (int, default=0): Random generator seed. If 0 then random generator will be initialized by system time.
- verbose (int, default=0): Controls the verbosity when fitting and predicting.
- metric (str, default='LogLoss'): Metric used for evaluating error. Choose from {'MSE', 'MAE', 'MSLE', 'LogLoss'}
- transformation (str, default=None): Final transformation for computed value. Choose from { None, 'LOGISTIC', 'ORDINAL'}
algo_settings (dict, default = None): If not defined SymbolicSolver.ALGO_SETTINGS is used.
algo_settings = {'neighbours_count':15, 'alpha':0.15, 'beta':0.5, 'pretest_size':1, 'sample_size':16}
- 'neighbours_count' : (int) Number tested neighbours in each iteration
- 'alpha' : (float) Score worsening limit for a iteration
- 'beta' : (float) Tree breadth-wise expanding factor in a range from 0 to 1
- 'pretest_size' : (int) Batch count(batch is 64 rows sample) for fast fitness preevaluating
- 'sample_size : (int) Number of batches of sample used to calculate the score during training
code_settings (dict, default = None): If not defined SymbolicSolver.CODE_SETTINGS is used.
code_settings = {'min_size': 32, 'max_size':32, 'const_size':8}
- 'const_size' : (int) Maximum alloved constants in symbolic model, accept also 0.
- 'min_size': (int) Minimum allowed equation size(as a linear program).
- 'max_size' : (int) Maximum allowed equation size(as a linear program).
population_settings (dict, default = None): If not defined SymbolicSolver.POPULATION_SETTINGS is used.
population_settings = {'size': 64, 'tournament':4}
- 'size' : (int) Number of individuals in the population.
- 'tournament' : (int) Tournament selection.
init_const_settings (dict, default = None): If not defined FuzzyRegressor.INIT_CONST_SETTINGS is used.
init_const_settings = {'const_min':0.0, 'const_max':1.0, 'predefined_const_prob':0.0, 'predefined_const_set': []}
- 'const_min' : (float) Lower range for initializing constants.
- 'const_max' : (float) Upper range for initializing constants.
- 'predefined_const_prob': (float) Probability of selecting one of the predefined constants during initialization.
- 'predefined_const_set' : (array of floats) Predefined constants used during initialization.
const_settings (dict, default = None): If not defined FuzzyRegressor.CONST_SETTINGS is used.
const_settings = {'const_min':0.0, 'const_max':1.0, 'predefined_const_prob':0.0, 'predefined_const_set': []}
- 'const_min' : (float) Lower range for constants used in equations.
- 'const_max' : (float) Upper range for constants used in equations.
- 'predefined_const_prob': (float) Probability of selecting one of the predefined constants during search process(mutation).
- 'predefined_const_set' : (array of floats) Predefined constants used during search process(mutation).
target_clip (array, default = None): Array of two float values clip_min and clip_max. If not defined SymbolicSolver.CLASSIFICATION_TARGET_CLIP is used.
target_clip=[3e-7, 1.0-3e-7]
class_weight (dict or 'balanced', default=None): Weights associated with classes in the form
{class_label: weight}
. If not given, all classes are supposed to have weight one.The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as
n_samples / (n_classes * np.bincount(y))
.Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.
cv_params (dict, default = None): If not defined SymbolicSolver.CLASSIFICATION_CV_PARAMS is used.
cv_params = {'n':0, 'cv_params':{}, 'select':'mean', 'opt_params':{'method': 'Nelder-Mead'}, 'opt_metric':make_scorer(log_loss, greater_is_better=False, response_method='predict_proba')}
- 'n' : (int) Crossvalidate n top models
- 'cv_params' : (dict) Parameters passed to scikit-learn cross_validate method
- select : (str) Best model selection method choose from 'mean'or 'median'
- opt_params : (dict) Parameters passed to scipy.optimize.minimize method
- opt_metric : (make_scorer) Scoring method
- warm_start (bool, default=False): If True, then the solver will be reused for the next call of fit.
151 def __init__( 152 self, 153 num_threads: int = 1, 154 time_limit: float = 5.0, 155 iter_limit: int = 0, 156 precision: str = "f32", 157 problem="fuzzy", 158 feature_probs="xicor", 159 random_state: int = 0, 160 verbose: int = 0, 161 metric: str = "LogLoss", 162 transformation: str = None, 163 algo_settings=None, 164 code_settings=None, 165 population_settings=None, 166 init_const_settings=None, 167 const_settings=None, 168 target_clip=None, 169 class_weight=None, 170 cv_params=None, 171 warm_start: bool = False, 172 ): 173 super(FuzzyRegressor, self).__init__( 174 num_threads=num_threads, 175 time_limit=time_limit, 176 iter_limit=iter_limit, 177 precision=precision, 178 problem=problem, 179 feature_probs=feature_probs, 180 random_state=random_state, 181 verbose=verbose, 182 metric=metric, 183 algo_settings=algo_settings, 184 transformation=transformation, 185 code_settings=code_settings, 186 population_settings=population_settings, 187 init_const_settings=init_const_settings, 188 const_settings=const_settings, 189 target_clip=target_clip, 190 class_weight=class_weight, 191 cv_params=cv_params, 192 warm_start=warm_start, 193 )
195 def fit(self, X, y, sample_weight=None, check_input=True): 196 """ 197 Fit the symbolic models according to the given training data. 198 199 Parameters 200 ---------- 201 X : array-like of shape (n_samples, n_features) 202 Training vector, where `n_samples` is the number of samples and 203 `n_features` is the number of features. Should be in the range [0, 1]. 204 205 y : array-like of shape (n_samples,) 206 Target vector relative to X. Needs samples of 2 classes. 207 208 sample_weight : array-like of shape (n_samples,) default=None 209 Array of weights that are assigned to individual samples. 210 If not provided, then each sample is given unit weight. 211 212 check_input : bool, default=True 213 Allow to bypass several input checking. 214 Don't use this parameter unless you know what you're doing. 215 216 Returns 217 ------- 218 self 219 Fitted estimator. 220 """ 221 if check_input: 222 X, y = validate_data(self, X, y, accept_sparse=False, y_numeric=False, multi_output=False) 223 224 y_type = type_of_target(y, input_name="y", raise_unknown=True) 225 if y_type != "binary": 226 raise ValueError("Only binary classification is supported. The type of the target " f"is {y_type}.") 227 check_classification_targets(y) 228 enc = LabelEncoder() 229 y_ind = enc.fit_transform(y) 230 self.classes_ = enc.classes_ 231 self.n_classes_ = len(self.classes_) 232 if self.n_classes_ != 2: 233 raise ValueError("This solver needs samples of 2 classes" " in the data, but the data contains" " %r classes" % self.n_classes_) 234 235 self.class_weight_ = compute_class_weight(self.class_weight, classes=self.classes_, y=y) 236 237 super(FuzzyRegressor, self).fit(X, y_ind, sample_weight=sample_weight, check_input=check_input) 238 return self
Fit the symbolic models according to the given training data.
Parameters
- X (array-like of shape (n_samples, n_features)):
Training vector, where
n_samples
is the number of samples andn_features
is the number of features. Should be in the range [0, 1]. - y (array-like of shape (n_samples,)): Target vector relative to X. Needs samples of 2 classes.
- sample_weight (array-like of shape (n_samples,) default=None): Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.
- check_input (bool, default=True): Allow to bypass several input checking. Don't use this parameter unless you know what you're doing.
Returns
- self: Fitted estimator.
240 def predict(self, X, id=None, check_input=True, use_parsed_model=True): 241 """ 242 Predict class for X. 243 244 Parameters 245 ---------- 246 X : array-like of shape (n_samples, n_features) 247 The input samples. 248 249 id : int 250 Model id, default=None. id can be obtained from get_models method. If its None prediction use the best model. 251 252 check_input : bool, default=True 253 Allow to bypass several input checking. 254 Don't use this parameter unless you know what you're doing. 255 256 Returns 257 ------- 258 y : ndarray of shape (n_samples,) 259 The predicted classes. 260 """ 261 preds = super(FuzzyRegressor, self).predict(X, id, check_input=check_input, use_parsed_model=use_parsed_model) 262 return self.classes_[(preds > 0.5).astype(int)]
Predict class for X.
Parameters
- X (array-like of shape (n_samples, n_features)): The input samples.
- id (int): Model id, default=None. id can be obtained from get_models method. If its None prediction use the best model.
- check_input (bool, default=True): Allow to bypass several input checking. Don't use this parameter unless you know what you're doing.
Returns
- y (ndarray of shape (n_samples,)): The predicted classes.
264 def predict_proba(self, X, id=None, check_input=True): 265 """ 266 Predict class probabilities for X. 267 268 Parameters 269 ---------- 270 X : array-like of shape (n_samples, n_features) 271 272 check_input : bool, default=True 273 Allow to bypass several input checking. 274 Don't use this parameter unless you know what you're doing. 275 276 Returns 277 ------- 278 T : ndarray of shape (n_samples, n_classes) 279 The class probabilities of the input samples. The order of the 280 classes corresponds to that in the attribute :term:`classes_`. 281 """ 282 preds = super(FuzzyRegressor, self).predict(X, id, check_input=check_input) 283 proba = numpy.vstack([1 - preds, preds]).T 284 return proba
Predict class probabilities for X.
Parameters
X (array-like of shape (n_samples, n_features)):
check_input (bool, default=True): Allow to bypass several input checking. Don't use this parameter unless you know what you're doing.
Returns
- T (ndarray of shape (n_samples, n_classes)):
The class probabilities of the input samples. The order of the
classes corresponds to that in the attribute :term:
classes_
.
Descriptor for defining set_{method}_request
methods in estimators.
New in version 1.3.
Parameters
- name (str):
The name of the method for which the request function should be
created, e.g.
"fit"
would create aset_fit_request
function. - keys (list of str):
A list of strings which are accepted parameters by the created
function, e.g.
["sample_weight"]
if the corresponding method accepts it as a metadata. - validate_keys (bool, default=True): Whether to check if the requested parameters fit the actual parameters of the method.
Notes
This class is a descriptor 1 and uses PEP-362 to set the signature of the returned function 2.
References
Descriptor for defining set_{method}_request
methods in estimators.
New in version 1.3.
Parameters
- name (str):
The name of the method for which the request function should be
created, e.g.
"fit"
would create aset_fit_request
function. - keys (list of str):
A list of strings which are accepted parameters by the created
function, e.g.
["sample_weight"]
if the corresponding method accepts it as a metadata. - validate_keys (bool, default=True): Whether to check if the requested parameters fit the actual parameters of the method.
Notes
This class is a descriptor 1 and uses PEP-362 to set the signature of the returned function 2.
References
Descriptor for defining set_{method}_request
methods in estimators.
New in version 1.3.
Parameters
- name (str):
The name of the method for which the request function should be
created, e.g.
"fit"
would create aset_fit_request
function. - keys (list of str):
A list of strings which are accepted parameters by the created
function, e.g.
["sample_weight"]
if the corresponding method accepts it as a metadata. - validate_keys (bool, default=True): Whether to check if the requested parameters fit the actual parameters of the method.
Notes
This class is a descriptor 1 and uses PEP-362 to set the signature of the returned function 2.
References
Descriptor for defining set_{method}_request
methods in estimators.
New in version 1.3.
Parameters
- name (str):
The name of the method for which the request function should be
created, e.g.
"fit"
would create aset_fit_request
function. - keys (list of str):
A list of strings which are accepted parameters by the created
function, e.g.
["sample_weight"]
if the corresponding method accepts it as a metadata. - validate_keys (bool, default=True): Whether to check if the requested parameters fit the actual parameters of the method.
Notes
This class is a descriptor 1 and uses PEP-362 to set the signature of the returned function 2.
References
293class FuzzyClassifier(OneVsRestClassifier): 294 """ 295 Fuzzy multiclass symbolic classificator 296 297 Parameters 298 ---------- 299 estimator : FuzzyRegressor 300 Instance of FuzzyRegressor class. 301 """ 302 303 def __init__(self, estimator: FuzzyRegressor): 304 super().__init__(estimator=estimator) 305 306 def fit(self, X, y): 307 """ 308 Fit the symbolic models according to the given training data. 309 310 Parameters 311 ---------- 312 X : array-like of shape (n_samples, n_features) 313 Training vector, where `n_samples` is the number of samples and 314 `n_features` is the number of features. Should be in the range [0, 1]. 315 316 y : array-like of shape (n_samples,) 317 Target vector relative to X. 318 319 Returns 320 ------- 321 self 322 Fitted estimator. 323 """ 324 325 super().fit(X, y) 326 return self 327 328 def predict(self, X): 329 """ 330 Predict class for X. 331 332 Parameters 333 ---------- 334 X : array-like of shape (n_samples, n_features) 335 The input samples. 336 337 Returns 338 ------- 339 y : ndarray of shape (n_samples,) 340 The predicted classes. 341 """ 342 return super().predict(X) 343 344 def predict_proba(self, X): 345 """ 346 Predict class probabilities for X. 347 348 Parameters 349 ---------- 350 X : array-like of shape (n_samples, n_features) 351 352 Returns 353 ------- 354 T : ndarray of shape (n_samples, n_classes) 355 The class probabilities of the input samples. The order of the 356 classes corresponds to that in the attribute :term:`classes_`. 357 """ 358 return super().predict_proba(X) 359 360 def __sklearn_tags__(self): 361 tags = super().__sklearn_tags__() 362 tags.classifier_tags = ClassifierTags(poor_score=True) 363 return tags
Fuzzy multiclass symbolic classificator
Parameters
- estimator (FuzzyRegressor): Instance of FuzzyRegressor class.
306 def fit(self, X, y): 307 """ 308 Fit the symbolic models according to the given training data. 309 310 Parameters 311 ---------- 312 X : array-like of shape (n_samples, n_features) 313 Training vector, where `n_samples` is the number of samples and 314 `n_features` is the number of features. Should be in the range [0, 1]. 315 316 y : array-like of shape (n_samples,) 317 Target vector relative to X. 318 319 Returns 320 ------- 321 self 322 Fitted estimator. 323 """ 324 325 super().fit(X, y) 326 return self
Fit the symbolic models according to the given training data.
Parameters
- X (array-like of shape (n_samples, n_features)):
Training vector, where
n_samples
is the number of samples andn_features
is the number of features. Should be in the range [0, 1]. - y (array-like of shape (n_samples,)): Target vector relative to X.
Returns
- self: Fitted estimator.
328 def predict(self, X): 329 """ 330 Predict class for X. 331 332 Parameters 333 ---------- 334 X : array-like of shape (n_samples, n_features) 335 The input samples. 336 337 Returns 338 ------- 339 y : ndarray of shape (n_samples,) 340 The predicted classes. 341 """ 342 return super().predict(X)
Predict class for X.
Parameters
- X (array-like of shape (n_samples, n_features)): The input samples.
Returns
- y (ndarray of shape (n_samples,)): The predicted classes.
344 def predict_proba(self, X): 345 """ 346 Predict class probabilities for X. 347 348 Parameters 349 ---------- 350 X : array-like of shape (n_samples, n_features) 351 352 Returns 353 ------- 354 T : ndarray of shape (n_samples, n_classes) 355 The class probabilities of the input samples. The order of the 356 classes corresponds to that in the attribute :term:`classes_`. 357 """ 358 return super().predict_proba(X)
Predict class probabilities for X.
Parameters
- X (array-like of shape (n_samples, n_features)):
Returns
- T (ndarray of shape (n_samples, n_classes)):
The class probabilities of the input samples. The order of the
classes corresponds to that in the attribute :term:
classes_
.
Descriptor for defining set_{method}_request
methods in estimators.
New in version 1.3.
Parameters
- name (str):
The name of the method for which the request function should be
created, e.g.
"fit"
would create aset_fit_request
function. - keys (list of str):
A list of strings which are accepted parameters by the created
function, e.g.
["sample_weight"]
if the corresponding method accepts it as a metadata. - validate_keys (bool, default=True): Whether to check if the requested parameters fit the actual parameters of the method.
Notes
This class is a descriptor 1 and uses PEP-362 to set the signature of the returned function 2.
References
Descriptor for defining set_{method}_request
methods in estimators.
New in version 1.3.
Parameters
- name (str):
The name of the method for which the request function should be
created, e.g.
"fit"
would create aset_fit_request
function. - keys (list of str):
A list of strings which are accepted parameters by the created
function, e.g.
["sample_weight"]
if the corresponding method accepts it as a metadata. - validate_keys (bool, default=True): Whether to check if the requested parameters fit the actual parameters of the method.
Notes
This class is a descriptor 1 and uses PEP-362 to set the signature of the returned function 2.
References
207class RegressorMathModel(RegressorMixin, MathModelBase): 208 """ 209 A regressor class for the symbolic model. 210 """ 211 212 def __init__(self, m: ParsedMathModel, opt_metric, opt_params, transformation, target_clip) -> None: 213 super().__init__(m, opt_metric, opt_params, transformation, target_clip, None, None) 214 215 def fit(self, X, y, sample_weight=None, check_input=True): 216 """ 217 Fit the model according to the given training data. 218 219 That means find a optimal values for constants in a symbolic equation. 220 221 Parameters 222 ---------- 223 X : array-like of shape (n_samples, n_features) 224 Training vector, where `n_samples` is the number of samples and 225 `n_features` is the number of features. 226 227 y : array-like of shape (n_samples,) 228 Target vector relative to X. 229 230 sample_weight : array-like of shape (n_samples,) default=None 231 Array of weights that are assigned to individual samples. 232 If not provided, then each sample is given unit weight. 233 234 check_input : bool, default=True 235 Allow to bypass several input checking. 236 Don't use this parameter unless you know what you're doing. 237 238 Returns 239 ------- 240 self 241 Fitted estimator. 242 """ 243 244 def objective(c): 245 return self.__eval(X, y, metric=self.opt_metric, c=c, sample_weight=sample_weight) 246 247 if len(self.m.coeffs) > 0: 248 result = opt.minimize(objective, self.m.coeffs, **self.opt_params) 249 250 for i in range(len(self.m.coeffs)): 251 self.m.coeffs[i] = result.x[i] 252 253 self.is_fitted_ = True 254 return self 255 256 def predict(self, X, check_input=True): 257 """ 258 Predict regression target for X. 259 260 Parameters 261 ---------- 262 X : array-like of shape (n_samples, n_features) 263 The input samples. 264 265 check_input : bool, default=True 266 Allow to bypass several input checking. 267 Don't use this parameter unless you know what you're doing. 268 269 Returns 270 ------- 271 y : ndarray of shape (n_samples,) or (n_samples, n_outputs) 272 The predicted values. 273 """ 274 return self._predict(X) 275 276 def __eval(self, X, y, metric, c=None, sample_weight=None): 277 if c is not None: 278 self.m.coeffs = c 279 try: 280 return -metric(self, X, y, sample_weight=sample_weight) 281 except Exception: 282 return SymbolicSolver.LARGE_FLOAT 283 284 def __str__(self): 285 return f"RegressorMathModel({self.m.str_representation})" 286 287 def __repr__(self): 288 return f"RegressorMathModel({self.m.str_representation})" 289 290 def __sklearn_tags__(self): 291 return super().__sklearn_tags__()
A regressor class for the symbolic model.
215 def fit(self, X, y, sample_weight=None, check_input=True): 216 """ 217 Fit the model according to the given training data. 218 219 That means find a optimal values for constants in a symbolic equation. 220 221 Parameters 222 ---------- 223 X : array-like of shape (n_samples, n_features) 224 Training vector, where `n_samples` is the number of samples and 225 `n_features` is the number of features. 226 227 y : array-like of shape (n_samples,) 228 Target vector relative to X. 229 230 sample_weight : array-like of shape (n_samples,) default=None 231 Array of weights that are assigned to individual samples. 232 If not provided, then each sample is given unit weight. 233 234 check_input : bool, default=True 235 Allow to bypass several input checking. 236 Don't use this parameter unless you know what you're doing. 237 238 Returns 239 ------- 240 self 241 Fitted estimator. 242 """ 243 244 def objective(c): 245 return self.__eval(X, y, metric=self.opt_metric, c=c, sample_weight=sample_weight) 246 247 if len(self.m.coeffs) > 0: 248 result = opt.minimize(objective, self.m.coeffs, **self.opt_params) 249 250 for i in range(len(self.m.coeffs)): 251 self.m.coeffs[i] = result.x[i] 252 253 self.is_fitted_ = True 254 return self
Fit the model according to the given training data.
That means find a optimal values for constants in a symbolic equation.
Parameters
- X (array-like of shape (n_samples, n_features)):
Training vector, where
n_samples
is the number of samples andn_features
is the number of features. - y (array-like of shape (n_samples,)): Target vector relative to X.
- sample_weight (array-like of shape (n_samples,) default=None): Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.
- check_input (bool, default=True): Allow to bypass several input checking. Don't use this parameter unless you know what you're doing.
Returns
- self: Fitted estimator.
256 def predict(self, X, check_input=True): 257 """ 258 Predict regression target for X. 259 260 Parameters 261 ---------- 262 X : array-like of shape (n_samples, n_features) 263 The input samples. 264 265 check_input : bool, default=True 266 Allow to bypass several input checking. 267 Don't use this parameter unless you know what you're doing. 268 269 Returns 270 ------- 271 y : ndarray of shape (n_samples,) or (n_samples, n_outputs) 272 The predicted values. 273 """ 274 return self._predict(X)
Predict regression target for X.
Parameters
- X (array-like of shape (n_samples, n_features)): The input samples.
- check_input (bool, default=True): Allow to bypass several input checking. Don't use this parameter unless you know what you're doing.
Returns
- y (ndarray of shape (n_samples,) or (n_samples, n_outputs)): The predicted values.
Descriptor for defining set_{method}_request
methods in estimators.
New in version 1.3.
Parameters
- name (str):
The name of the method for which the request function should be
created, e.g.
"fit"
would create aset_fit_request
function. - keys (list of str):
A list of strings which are accepted parameters by the created
function, e.g.
["sample_weight"]
if the corresponding method accepts it as a metadata. - validate_keys (bool, default=True): Whether to check if the requested parameters fit the actual parameters of the method.
Notes
This class is a descriptor 1 and uses PEP-362 to set the signature of the returned function 2.
References
Descriptor for defining set_{method}_request
methods in estimators.
New in version 1.3.
Parameters
- name (str):
The name of the method for which the request function should be
created, e.g.
"fit"
would create aset_fit_request
function. - keys (list of str):
A list of strings which are accepted parameters by the created
function, e.g.
["sample_weight"]
if the corresponding method accepts it as a metadata. - validate_keys (bool, default=True): Whether to check if the requested parameters fit the actual parameters of the method.
Notes
This class is a descriptor 1 and uses PEP-362 to set the signature of the returned function 2.
References
Descriptor for defining set_{method}_request
methods in estimators.
New in version 1.3.
Parameters
- name (str):
The name of the method for which the request function should be
created, e.g.
"fit"
would create aset_fit_request
function. - keys (list of str):
A list of strings which are accepted parameters by the created
function, e.g.
["sample_weight"]
if the corresponding method accepts it as a metadata. - validate_keys (bool, default=True): Whether to check if the requested parameters fit the actual parameters of the method.
Notes
This class is a descriptor 1 and uses PEP-362 to set the signature of the returned function 2.
References
294class ClassifierMathModel(ClassifierMixin, MathModelBase): 295 """ 296 A classifier class for the symbolic model. 297 """ 298 299 def __init__( 300 self, 301 m: ParsedMathModel, 302 opt_metric, 303 opt_params, 304 transformation, 305 target_clip, 306 class_weight_, 307 classes_, 308 ) -> None: 309 super().__init__( 310 m, 311 opt_metric, 312 opt_params, 313 transformation, 314 target_clip, 315 class_weight_, 316 classes_, 317 ) 318 319 def fit(self, X, y, sample_weight=None, check_input=True): 320 """ 321 Fit the model according to the given training data. 322 323 That means find a optimal values for constants in a symbolic equation. 324 325 Parameters 326 ---------- 327 X : array-like of shape (n_samples, n_features) 328 Training vector, where `n_samples` is the number of samples and 329 `n_features` is the number of features. 330 331 y : array-like of shape (n_samples,) 332 Target vector relative to X. Needs samples of 2 classes. 333 334 sample_weight : array-like of shape (n_samples,) default=None 335 Array of weights that are assigned to individual samples. 336 If not provided, then each sample is given unit weight. 337 338 check_input : bool, default=True 339 Allow to bypass several input checking. 340 Don't use this parameter unless you know what you're doing. 341 342 Returns 343 ------- 344 self 345 Fitted estimator. 346 """ 347 348 check_classification_targets(y) 349 enc = LabelEncoder() 350 y_ind = enc.fit_transform(y) 351 self.classes_ = enc.classes_ 352 self.n_classes_ = len(self.classes_) 353 if self.n_classes_ != 2: 354 raise ValueError("This solver needs samples of 2 classes" " in the data, but the data contains" " %r classes" % self.n_classes_) 355 356 cw = self.class_weight_ 357 cw_sample_weight = numpy.array(cw)[y_ind] if len(cw) == 2 and cw[0] != cw[1] else None 358 if sample_weight is None: 359 sample_weight = cw_sample_weight 360 elif cw_sample_weight is not None: 361 sample_weight = sample_weight * cw_sample_weight 362 363 def objective(c): 364 return self.__eval(X, y, metric=self.opt_metric, c=c, sample_weight=sample_weight) 365 366 if len(self.m.coeffs) > 0: 367 result = opt.minimize(objective, self.m.coeffs, **self.opt_params) 368 369 for i in range(len(self.m.coeffs)): 370 self.m.coeffs[i] = result.x[i] 371 372 self.is_fitted_ = True 373 return self 374 375 def predict(self, X, check_input=True): 376 """ 377 Predict class for X. 378 379 Parameters 380 ---------- 381 X : array-like of shape (n_samples, n_features) 382 The input samples. 383 384 check_input : bool, default=True 385 Allow to bypass several input checking. 386 Don't use this parameter unless you know what you're doing. 387 388 Returns 389 ------- 390 y : ndarray of shape (n_samples,) 391 The predicted classes. 392 """ 393 preds = self._predict(X, check_input=check_input) 394 return self.classes_[(preds > 0.5).astype(int)] 395 396 def predict_proba(self, X, check_input=True): 397 """ 398 Predict class probabilities for X. 399 400 Parameters 401 ---------- 402 X : array-like of shape (n_samples, n_features) 403 404 check_input : bool, default=True 405 Allow to bypass several input checking. 406 Don't use this parameter unless you know what you're doing. 407 408 Returns 409 ------- 410 T : ndarray of shape (n_samples, n_classes) 411 The class probabilities of the input samples. The order of the 412 classes corresponds to that in the attribute :term:`classes_`. 413 """ 414 preds = self._predict(X, check_input=check_input) 415 proba = numpy.vstack([1 - preds, preds]).T 416 return proba 417 418 def __eval(self, X, y, metric, c=None, sample_weight=None): 419 if c is not None: 420 self.m.coeffs = c 421 try: 422 return -metric(self, X, y, sample_weight=sample_weight) 423 except Exception: 424 return SymbolicSolver.LARGE_FLOAT 425 426 def __str__(self): 427 return f"ClassifierMathModel({self.m.str_representation})" 428 429 def __repr__(self): 430 return f"ClassifierMathModel({self.m.str_representation})" 431 432 def __sklearn_tags__(self): 433 return super().__sklearn_tags__()
A classifier class for the symbolic model.
299 def __init__( 300 self, 301 m: ParsedMathModel, 302 opt_metric, 303 opt_params, 304 transformation, 305 target_clip, 306 class_weight_, 307 classes_, 308 ) -> None: 309 super().__init__( 310 m, 311 opt_metric, 312 opt_params, 313 transformation, 314 target_clip, 315 class_weight_, 316 classes_, 317 )
319 def fit(self, X, y, sample_weight=None, check_input=True): 320 """ 321 Fit the model according to the given training data. 322 323 That means find a optimal values for constants in a symbolic equation. 324 325 Parameters 326 ---------- 327 X : array-like of shape (n_samples, n_features) 328 Training vector, where `n_samples` is the number of samples and 329 `n_features` is the number of features. 330 331 y : array-like of shape (n_samples,) 332 Target vector relative to X. Needs samples of 2 classes. 333 334 sample_weight : array-like of shape (n_samples,) default=None 335 Array of weights that are assigned to individual samples. 336 If not provided, then each sample is given unit weight. 337 338 check_input : bool, default=True 339 Allow to bypass several input checking. 340 Don't use this parameter unless you know what you're doing. 341 342 Returns 343 ------- 344 self 345 Fitted estimator. 346 """ 347 348 check_classification_targets(y) 349 enc = LabelEncoder() 350 y_ind = enc.fit_transform(y) 351 self.classes_ = enc.classes_ 352 self.n_classes_ = len(self.classes_) 353 if self.n_classes_ != 2: 354 raise ValueError("This solver needs samples of 2 classes" " in the data, but the data contains" " %r classes" % self.n_classes_) 355 356 cw = self.class_weight_ 357 cw_sample_weight = numpy.array(cw)[y_ind] if len(cw) == 2 and cw[0] != cw[1] else None 358 if sample_weight is None: 359 sample_weight = cw_sample_weight 360 elif cw_sample_weight is not None: 361 sample_weight = sample_weight * cw_sample_weight 362 363 def objective(c): 364 return self.__eval(X, y, metric=self.opt_metric, c=c, sample_weight=sample_weight) 365 366 if len(self.m.coeffs) > 0: 367 result = opt.minimize(objective, self.m.coeffs, **self.opt_params) 368 369 for i in range(len(self.m.coeffs)): 370 self.m.coeffs[i] = result.x[i] 371 372 self.is_fitted_ = True 373 return self
Fit the model according to the given training data.
That means find a optimal values for constants in a symbolic equation.
Parameters
- X (array-like of shape (n_samples, n_features)):
Training vector, where
n_samples
is the number of samples andn_features
is the number of features. - y (array-like of shape (n_samples,)): Target vector relative to X. Needs samples of 2 classes.
- sample_weight (array-like of shape (n_samples,) default=None): Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.
- check_input (bool, default=True): Allow to bypass several input checking. Don't use this parameter unless you know what you're doing.
Returns
- self: Fitted estimator.
375 def predict(self, X, check_input=True): 376 """ 377 Predict class for X. 378 379 Parameters 380 ---------- 381 X : array-like of shape (n_samples, n_features) 382 The input samples. 383 384 check_input : bool, default=True 385 Allow to bypass several input checking. 386 Don't use this parameter unless you know what you're doing. 387 388 Returns 389 ------- 390 y : ndarray of shape (n_samples,) 391 The predicted classes. 392 """ 393 preds = self._predict(X, check_input=check_input) 394 return self.classes_[(preds > 0.5).astype(int)]
Predict class for X.
Parameters
- X (array-like of shape (n_samples, n_features)): The input samples.
- check_input (bool, default=True): Allow to bypass several input checking. Don't use this parameter unless you know what you're doing.
Returns
- y (ndarray of shape (n_samples,)): The predicted classes.
396 def predict_proba(self, X, check_input=True): 397 """ 398 Predict class probabilities for X. 399 400 Parameters 401 ---------- 402 X : array-like of shape (n_samples, n_features) 403 404 check_input : bool, default=True 405 Allow to bypass several input checking. 406 Don't use this parameter unless you know what you're doing. 407 408 Returns 409 ------- 410 T : ndarray of shape (n_samples, n_classes) 411 The class probabilities of the input samples. The order of the 412 classes corresponds to that in the attribute :term:`classes_`. 413 """ 414 preds = self._predict(X, check_input=check_input) 415 proba = numpy.vstack([1 - preds, preds]).T 416 return proba
Predict class probabilities for X.
Parameters
X (array-like of shape (n_samples, n_features)):
check_input (bool, default=True): Allow to bypass several input checking. Don't use this parameter unless you know what you're doing.
Returns
- T (ndarray of shape (n_samples, n_classes)):
The class probabilities of the input samples. The order of the
classes corresponds to that in the attribute :term:
classes_
.
Descriptor for defining set_{method}_request
methods in estimators.
New in version 1.3.
Parameters
- name (str):
The name of the method for which the request function should be
created, e.g.
"fit"
would create aset_fit_request
function. - keys (list of str):
A list of strings which are accepted parameters by the created
function, e.g.
["sample_weight"]
if the corresponding method accepts it as a metadata. - validate_keys (bool, default=True): Whether to check if the requested parameters fit the actual parameters of the method.
Notes
This class is a descriptor 1 and uses PEP-362 to set the signature of the returned function 2.
References
Descriptor for defining set_{method}_request
methods in estimators.
New in version 1.3.
Parameters
- name (str):
The name of the method for which the request function should be
created, e.g.
"fit"
would create aset_fit_request
function. - keys (list of str):
A list of strings which are accepted parameters by the created
function, e.g.
["sample_weight"]
if the corresponding method accepts it as a metadata. - validate_keys (bool, default=True): Whether to check if the requested parameters fit the actual parameters of the method.
Notes
This class is a descriptor 1 and uses PEP-362 to set the signature of the returned function 2.
References
Descriptor for defining set_{method}_request
methods in estimators.
New in version 1.3.
Parameters
- name (str):
The name of the method for which the request function should be
created, e.g.
"fit"
would create aset_fit_request
function. - keys (list of str):
A list of strings which are accepted parameters by the created
function, e.g.
["sample_weight"]
if the corresponding method accepts it as a metadata. - validate_keys (bool, default=True): Whether to check if the requested parameters fit the actual parameters of the method.
Notes
This class is a descriptor 1 and uses PEP-362 to set the signature of the returned function 2.
References
Descriptor for defining set_{method}_request
methods in estimators.
New in version 1.3.
Parameters
- name (str):
The name of the method for which the request function should be
created, e.g.
"fit"
would create aset_fit_request
function. - keys (list of str):
A list of strings which are accepted parameters by the created
function, e.g.
["sample_weight"]
if the corresponding method accepts it as a metadata. - validate_keys (bool, default=True): Whether to check if the requested parameters fit the actual parameters of the method.
Notes
This class is a descriptor 1 and uses PEP-362 to set the signature of the returned function 2.
References
521def Xicor(X: numpy.ndarray, Y: numpy.ndarray): 522 """ 523 Xicor corelation coefficient. 524 525 This function computes the xi coefficient between two vectors x and y. 526 https://doi.org/10.1080/01621459.2020.1758115 527 528 Parameters 529 ---------- 530 X : array-like input vector x 531 532 Y : array-like input vector y 533 534 Returns 535 ------- 536 xi : float 537 """ 538 if X.ndim != 1: 539 X = numpy.ravel(X) 540 if Y.ndim != 1: 541 Y = numpy.ravel(Y) 542 if len(X) != len(Y): 543 raise ValueError("X and Y must be same size") 544 precision = numpy.float32 545 if X.dtype == Y.dtype and X.dtype == numpy.float64: 546 precision = numpy.float64 547 if not X.flags["C_CONTIGUOUS"] or X.dtype != precision: 548 X = numpy.ascontiguousarray(X.astype(precision)) 549 if not Y.flags["C_CONTIGUOUS"] or Y.dtype != precision: 550 Y = numpy.ascontiguousarray(Y.astype(precision)) 551 if precision == numpy.float32: 552 return Xicor32(X, Y, len(X)) 553 return Xicor64(X, Y, len(X))
Xicor corelation coefficient.
This function computes the xi coefficient between two vectors x and y. https://doi.org/10.1080/01621459.2020.1758115
Parameters
X (array-like input vector x):
Y (array-like input vector y):
Returns
- xi (float):