This is a personal study made on credit dataset. The problem is binary classification, to classify bank customers as their lending ratings.
First I've analysed the dataset by reading through the CSV. Checked for any unknown values, the data had no empty cells. Used label encoding
Examined the correlation matrix, this helped tweaking the bayesian network nodes.
The target class is imbalanced, while there is 700 'good' credit samples, there is only 300 'bad' credit samples included.
The data didn't have any null/NaN/? values.
All data were categorical, while evaluation, I took that into account.
The dataset was lack of any single females. This particular data must be gathered in case of any single female prediction need.
- Categorical Naive Bayes
- DecisionTreeClassifier
- GaussianNB
- KNeighborsClassifier
- LGBMClassifier
- LinearDiscriminantAnalysis
- LogisticRegression
- RandomForestClassifier
- Support Vector Classifier
- XGBClassifier
- Bayesian Network