Credit Analysis Using Data Mining: Application in the Case of a Credit Union

Marcos de Moraes Sousa, Reginaldo Santana Figueiredo


The search for efficiency in the cooperative credit sector has led cooperatives to adopt new technology and managerial knowhow. Among the tools that facilitate efficiency, data mining has stood out in recent years as a sophisticated methodology to search for knowledge that is “hidden” in organizations' databases. The process of granting credit is one of the central functions of a credit union; therefore, the use of instruments that support that process is desirable and may become a key factor in credit management. The steps undertaken by the present case study to perform the knowledge discovery process were data selection, data pre-processing and cleanup, data transformation, data mining, and the interpretation and evaluation of results. The results were evaluated through cross-validation of ten sets, repeated in ten simulations. The goal of this study is to develop models to analyze the capacity of a credit union's members to settle their commitments, using a decision tree—C4.5 algorithm and an artificial neural network—multilayer perceptron algorithm. It is concluded that for the problem at hand, the models have statistically similar results and may aid in a cooperative's decision-making process.


Credit Unionism; Data Mining; Decision Tree; Artificial Neural Network.


Abellán, J., & Mantas, C. J. (2014). Improving experimental studies about ensembles of classifiers for bankruptcy prediction and credit scoring. Expert Systems with Applications, 41(8), 3825–3830.

Akkoç, S. (2012). An empirical comparison of conventional techniques, neural networks and the three stage hybrid Adaptive Neuro Fuzzy Inference System (ANFIS) model for credit scoring analysis: The case of Turkish credit card data. European Journal of Operational Research, 222(1), 168–178.

Berry, M. J. A., & Linoff, G. (2004). Data mining techniques: For marketing, sales and customer relationship management (2nd ed.). Indianapolis: wiley Publishing.

Bhattacharyya, S., Jha, S., Tharakunnel, K., & Christopher, J. (2011). Data mining for credit card fraud: A comparative study. Decision Support Systems, 50(3), 602–613.

Braga, A. de P., Carvalho, A. P. de L. F., & Ludermir, T. B. (2000). Redes neurais artificiais: Teoria e aplicações. Rio de Janeiro: LTC.

Chaia, A. J. (2003). Modelos de gestão de risco de crédito e sua aplicabilidade ao mercado brasileiro. Dissertação de Mestrado. FEA/USP.

Chang, S.-Y., & Yeh, T.-Y. (2012). An artificial immune classifier for credit scoring analysis. Applied Soft Computing, 12(2), 611–618.

Chawla, N. V. (2005). Data mining for imbalanced datasets: An overview. In Data mining and knowledge discovery handbook (pp. 853–867). New Jersey: Springer.

Chen, S. C., & Huang, M. Y. (2011). Constructing credit auditing and control & management model with data mining technique. Expert Systems with Applications, 38(5359-5365).

Crone, S. F., & Finlay, S. (2012). Instance sampling in credit scoring: An empirical study of sample size and balancing. International Journal of Forecasting, 28(1), 224–238.

Cubiles-De-La-Vega, M.-D., Blanco-Oliver, A., Pino-Mejías, R., & Lara-Rubio, J. (2013). Improving the management of microfinance institutions by using credit scoring models based on Statistical Learning techniques. Expert Systems with Applications, 40(17), 6910–6917.

Dasu, T., & Johnson, T. (2003). Exploratory data mining and data cleaning. New Jersey: John Wiley & Sons.

Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. A I Magazine, 17(3), 37–54.

Ferreira, J. B. (2005). Mineração de dados na retenção de clientes em telefonia celular. Dissertação de Mestrado. PUC-RIO.

García, V., Marqués, A. I., & Sánchez, J. S. (2012). On the use of data filtering techniques for credit risk prediction with instance-based models. Expert Systems with Applications, 39(18), 13267–13276.

Gehrke, J. (2003). Decision tree. In The handbook of data mining (pp. 3–23). New Jersey: Lawrence Erlbaum Associates.

Goldschmidt, R., & Passos, E. (2005). Data mining: Um guia prático. Rio de Janeiro: Elsevier.

Han, L., Han, L., & Zhao, H. (2013). Orthogonal support vector machine for credit scoring. Engineering Applications of Artificial Intelligence, 26(2), 848–862.

Horta, R. A. M., Borges, C. C. H., Carvalho, F. A. A., & Alves, F. J. S. (2011). Previsão de insolvência: Uma estratégia para balanceamento da base de dados utilizando variáveis contábeis de empresas brasileiras. Sociedade, Contabilidade E Gestão, 6(2), 21–36.

Ju, Y. H., & Sohn, S. Y. (2014). Updating a credit-scoring model based on new attributes without realization of actual data. European Journal of Operational Research, 234(1), 119–126.

Khatchatourian, O., & Treter, J. (2010). APLICAÇÃO DA LÓGICA FUZZY PARA AVALIAÇÃO ECONÔMICO-FINANCEIRA DE COOPERATIVAS DE PRODUÇÃO. Revista de Gestão Da Tecnologia E Sistemas de Informação, 7(1), 141–162.

Koh, H. C., Tan, W. C., & Goh, C. P. (2006). A two-step method to construct credit scoring models with data mining techniques. International Journal of Business and Information, 1(1), 96–118.

Kruppa, J., Schwarz, A., Arminger, G., & Ziegler, A. (2013). Consumer credit risk: Individual probability estimates using machine learning. Expert Systems with Applications, 40(13), 5125–5131.

Lai, K. K., Yu, L., Wang, S., & Zhou, L. (2006). Credit risk analysis using a reliability-based neural network ensemble model. In Artificial Neural Networks-ICANN 2006 (pp. 682–690). Springer Berlin Heidelberg.

Larose, T. D. (2005). Discovering knowledge in data: An introduction to data mining. New Jersey: John Wiley & Sons.

Lemos, E. P., Steiner, M. T. A., & Nievola, J. C. (2005). Análise de crédito bancário por meio de redes neurais e árvore de decisao: Uma aplicação simples de data mining. Revista de Administração Da Universidade de São Paulo, 40(3), 225–234.

Majeske, K. D., & Lauer, T. W. (2013). The bank loan approval decision from multiple perspectives. Expert Systems with Applications, 40(5), 1591–1598.

Marqués, A. I., García, V., & Sánchez, J. S. (2012). Two-level classifier ensembles for credit risk assessment. Expert Systems with Applications, 39(12), 10916–10922.

Mester, L. J. (1997). What’s the point of credit scoring? Business Review, 3, 3–16.

Nie, G., Rowe, W., Zhang, L., Tian, Y., & Shi, Y. (2011). Credit card churn forecasting by logistic regression and decision tree. Expert Systems with Applications, 38(12), 15273–15285.

OCB. (2014). Organização das Cooperativas Brasileiras. Números. Retrieved February 20, 2014, from

Oliveira, D. P. R. (2001). Manual de gestão de cooperativas: Uma abordagem prática. São Paulo: Atlas.

Oreski, S., & Oreski, G. (2014). Genetic algorithm-based heuristic for feature selection in credit risk assessment. Expert Systems with Applications, 41(4), 2052–2064.

Pidd, M. (1998). Modelagem empresarial: Ferramentas para tomada de decisão. São Paulo: Atlas.

Pinho, D. B. (1982). O pensamento cooperativo e o cooperativismo brasileiro. CNPq/BNCC.

Pinho, D. B. (2004). O cooperativismo no Brasil: Da vertente pioneira à vertente solidaria. São Paulo: Saraiva.

Portal do Cooperativismo de Crédito. (2014). Dados consolidados dos sistemas cooperativos. Retrieved February 20, 2014, from

Saberi, M., Mirtalaie, M. S., Hussain, F. K., Azadeh, A., Hussain, O. K., & Ashjari, B. (2013). A granular computing-based approach to credit scoring modeling. Neurocomputing, 122(25), 100–115.

Wang, G., Ma, J., Huang, L., & Xu, K. (2012). Two credit scoring models based on dual strategy ensemble trees. Knowledge-Based Systems, 26, 61–68.

Witten, I. H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques (2nd ed.). San Francisco: Elsevier.

Xiong, T., Wang, S., Mayers, A., & Monga, E. (2013). Personal bankruptcy prediction by mining credit card data. Expert Systems with Applications, 40(2), 665–676.

Yap, B. W., Ong, S. H., & Husain, N. H. M. (2011). Using data mining to improve assessment of credit worthiness. Expert Systems with Applications, 38(10), 13274–13283.

Yin, Robert, K. (2010). Estudo de caso: planejamento e métodos (4th ed.). Porto Alegre: Bookman.

Zhong, H., Miao, C., Shen, Z., & Feng, Y. (2014). Comparing the learning effectiveness of BP, ELM, I-ELM, and SVM for corporate credit ratings. Neurocomputing, 128(27), 285–295.

Zhou, X., Jiang, W., Shi, Y., & Tian, Y. (2011). Credit risk evaluation with kernel-based affine subspace nearest points learning method. Expert Systems with Applications, 38(4), 4272–4279.

Zhu, X., Li, J., Wu, D., Wang, H., & Liang, C. (2013). Balancing accuracy, complexity and interpretability in consumer credit decision making: A C-TOPSIS classification approach. Knowledge-Based Systems, 52, 258–267.


Licensed under