My Personal Action Plan ✍🏻

HOW CAN THESE TOOLS BENEFIT MY LIFE? By taking MGTA459, I have noticed several methods and tools that I can take advantage of and presumably can change my life. I would like to further develop and utilize these tools in a few ways. First, I want to be more influential in my future career life. Additionally, I aim to enhance my personal productivity and overall well-being by incorporating these tools into my daily routines....

Identify News Category Based on News Headlines (NLP)

Project Description This project focuses on fine-tune a distilbert model to predict news categories using only news headline. For model demo and downloading the model, please check my HuggingFace Repo🤗. HuggingFace Repository HuggingFace Demo Data Description The data is from Kaggle. There are 200, 000 rows and 42 cotagories in our predict column. Model Training Input preprocessing To transform text data into vetors, I first applied TfidfVectorizer to preproess text data....

Loan Status Analysis

Project Description The main goal of this project is to identify and predict the loan status of lenders. To figure out whether the dataset has time series characteristic, two cross-validation methods (K-fold, TimeSplit) are also used. In this project, RandomForest is the main model to predict and analyze the loan status. (The Dataset for model training includes 916567 rows and 10 columns from 2007 to 2017.) Data can be found in my GitHub Repository Response Variable Loan_Stat -> Including 3 status, Fully Paid, Charged Off, and Default Explanatory Variables Annual_Inc -> Annual income Emp_Length -> Employment length Dti -> The debt-to-income ratio of the borrower Delinq_2yrs -> The number of times the borrower had been 30+ days past due on a payment in the past 2 years Term -> Borrowing term Grade -> History credit grading Inq_Last_6mths -> The borrower’s number of inquiries by creditors in the last 6 months Purpose -> Purpose for borrowing Feature Engineering loan_stat Fully_Paid -> 0 Defult, Charged-Off` -> 1 Grade A, B, C, D, E, F, D -> 1, 2, 3, 4, 5, 6, 7 Purpose Debt_Consolidation -> 1 Other -> 0 1 2 3 4 5 6 7 8 #getting dummies for loan_status data_df['loan_status'] = data_df['loan_status']....

Recommendation System: Cloth Rating Prediction

Project Description The task in this project is to predict the user's rating for a given item_id and the user's features. Using this model, we recommend to each customer a set of items by ranking the predicted ratings. In the dataset, there are several interesting features, such as age, body size, bust size, height, and review text that is worth discovering in the model training process. However, some features are not easy to obtain before the user posts the rating....

Restaurant Type Prediction (NLP)

Project Description Kaggle Competition Page The dataset contains details about restaurants and their reviews. The goal of this kaggle competition is to design data mining models to predict the restaurant type using the observed variables. This is a challenge designed for Master of Business Analytics students at the Rady School of Management, University of California, San Diego. This competition is also part of the course requirements for MGTA 415 Analyzing Unstructured Data....

Stock Prediction System with Telegram Bot 🤖

Abstract Our topic focuses on constructing a stock forecasting system. This system can provide the user with the basic information of stock, stock price prediction, K-line chart, and the individual stock news. In the prediction aspect, our system uses four machine learning models, including Random Forest, XGBoost, LightGBM, and LSTM. To train our machine models, we use 250 different technical indicators as the variables and inputs. In order to make our model become better and more precise, we also used Shap and Skater the observe the reasoning process of the machine learning and improved our models by analyzing observation and changing parameters....

Tableau Project: Disaster Analysis

Project Goals The goal of this project is to identify trends and patterns in the occurrence of disasters, through analyzing historical disaster data. The deliverable of this project focuses on the magnitude of damage and the frequency of occurrence within certain time periods, and within certain regions. End User We can assume several end-users to whom we will provide this dashboard. One is NGO Organizations like Red Cross or Médecins Sans Frontières, MSF which need to allocate the staff to the specific area....

The Effect of CSR Decoupling on Corporations' Financial Performance

Abstract This research project mainly focuses on identifying the CSR decoupling behavior for the multi-international firms and utilizing the decoupling indicators to find if there is a relationship between firms’ CSR decoupling and financial indicators. Data Sources The sample used in this study is composed of listed and OTC companies in Taiwan. The data sources include: Company performance indicators and operating profiles of related companies are obtained from the Taiwan Economic Journal (TEJ) database....