Loan Status Analysis

Project Description The main goal of this project is to identify and predict the loan status of lenders. To figure out whether the dataset has time series characteristic, two cross-validation methods (K-fold, TimeSplit) are also used. In this project, RandomForest is the main model to predict and analyze the loan status. (The Dataset for model training includes 916567 rows and 10 columns from 2007 to 2017.) Data can be found in my GitHub Repository Response Variable Loan_Stat -> Including 3 status, Fully Paid, Charged Off, and Default Explanatory Variables Annual_Inc -> Annual income Emp_Length -> Employment length Dti -> The debt-to-income ratio of the borrower Delinq_2yrs -> The number of times the borrower had been 30+ days past due on a payment in the past 2 years Term -> Borrowing term Grade -> History credit grading Inq_Last_6mths -> The borrower’s number of inquiries by creditors in the last 6 months Purpose -> Purpose for borrowing Feature Engineering loan_stat Fully_Paid -> 0 Defult, Charged-Off` -> 1 Grade A, B, C, D, E, F, D -> 1, 2, 3, 4, 5, 6, 7 Purpose Debt_Consolidation -> 1 Other -> 0 1 2 3 4 5 6 7 8 #getting dummies for loan_status data_df['loan_status'] = data_df['loan_status']....

Restaurant Type Prediction (NLP)

Project Description Kaggle Competition Page The dataset contains details about restaurants and their reviews. The goal of this kaggle competition is to design data mining models to predict the restaurant type using the observed variables. This is a challenge designed for Master of Business Analytics students at the Rady School of Management, University of California, San Diego. This competition is also part of the course requirements for MGTA 415 Analyzing Unstructured Data....

The Effect of CSR Decoupling on Corporations' Financial Performance

Abstract This research project mainly focuses on identifying the CSR decoupling behavior for the multi-international firms and utilizing the decoupling indicators to find if there is a relationship between firms’ CSR decoupling and financial indicators. Data Sources The sample used in this study is composed of listed and OTC companies in Taiwan. The data sources include: Company performance indicators and operating profiles of related companies are obtained from the Taiwan Economic Journal (TEJ) database....