Julia Du

About Me

ccjulia@email.unc.edu

Research Interests: Computational and Data Science, Big Data Analytics, Machine Learning, Artificial Intelligence, Natural Language Processing, Time Series Analysis and Forecasting

I am a dedicated senior at the University of North Carolina at Chapel Hill, pursuing an academically rigorous path with a double major in Statistics and Analytics, and Mathematics, complemented by a minor in Data Science. My academic journey is rooted in a deep-seated passion for quantitative analysis and the practical application of mathematical concepts in solving complex problems.

In addition to my academic endeavors, I am actively involved in research at the UNC AMP Lab under the supervision of Dr. Alana Campbell from August 2022. Here, I am part of a dynamic data team using MNE-Python for preprocessing and analyzing EEG (electroencephalogram) data, focusing on identifying indicators in infants with autism. This experience has been particularly enlightening, allowing me to apply my analytical skills to meaningful medical research.

Simultaneously, I am engaged in research with Dr. Zhengwu Zhang from May 2023, delving into the realm of "Financial Time Series Data Forecasting Using News and Social Media Messages with Large Language Models." This project represents a fascinating intersection of data science and financial analysis, offering me the opportunity to explore the impact of contemporary media on financial markets through advanced analytical techniques.

As an active participant in the Carolina Analytics and Data Science (CADS) club, I have embraced the opportunity to be part of a community that is equally committed to the exploration and advancement of data science. Through various initiatives such as guest speaker series, data case competitions, and interactive workshops, I have not only gained insights from seasoned professionals but have also applied my academic skills to real-life challenges. These experiences have been pivotal in my professional development, allowing me to contribute meaningfully to community-centric projects and enhancing my analytical and problem-solving capabilities.

On this website, I seek to share my academic and professional experiences, insights, and the projects I've done along my journey in the realm of data science. It is a platform for me to connect with like-minded individuals, professionals, and academicians, fostering a collaborative environment for knowledge sharing and professional growth.

Projects

Image Classification Machine Learning Project

(Jan. - May.2023)

This project explores the novel application of transfer learning techniques in classifying dogs' emotions, utilizing a combined dataset of 19,921 dog images from Kaggle ( Dataset1 & Dataset2). Key methodologies include experimenting with various pretrained models like ResNet50, classifier modifications (e.g., SVM, KNN), multi-task learning, fine-tuning, and unsupervised domain adaptation. The study aims to enhance the understanding of dog emotions for better pet care, overcoming challenges like data imbalance and subjective labeling. Preliminary results demonstrate the effectiveness of transfer learning in improving training accuracy, with SVM emerging as a particularly effective classifier, although challenges such as overfitting and limited testing accuracy present areas for further research and improvement. This innovative approach offers significant potential in the realm of animal sentiment analysis, particularly benefiting dog owners by providing deeper insights into their pets' emotional states.

Topics: Machine Learning, Deep Learning, Transfer Learning, Image Classification, Multi-task Learning, Fine-tuning, Domain Adaption

Link to Final Report

Link to Presentation Slides

Comparative Sentiment Analysis in Student and Vehicle Loans Project

(Jan. - May.2023)

This project undertakes a comprehensive analysis to discern the relationship between complaint sentiments and loan types, focusing on student and vehicle loans. Utilizing a dataset from the Consumer Financial Protection Bureau ( Consumer Complaint Database), the team applies various data science techniques, including sentiment analysis, hypothesis testing, and classification models, in R. They aim to identify differences in the proportion of negative words in loan-related complaints and develop predictive models to categorize complaints accurately. The project involves data cleaning, exploratory analysis, quantifying narrative sentiments, and applying statistical tests and Naive Bayes classifiers. The findings indicate subtle differences in sentiments between the two loan types, offering insights into customer experiences and perceptions in the financial sector.

Topics: Natural Language Processing, Data and Text Mining, Predictive Sentiment Analysis, Machine Learning

Link to Final Report

Link to Presentation Slides

Legal Urgency Assessment and Prioritization System Project

(Mar.2023)

This project aims to revolutionize the way law consulting requests are processed by introducing a method to evaluate and prioritize cases based on urgency. Utilizing a novel combination of machine learning for case categorization and sentiment analysis, the project develops a dual-factor urgency assessment model. This model assigns urgency levels to cases by considering the case category and the emotional tone of the text. Categories such as health and housing are given higher urgency, while sentiment scores provide a nuanced understanding of client emotions. By multiplying these two urgency factors, the system effectively identifies cases that require immediate attention. This methodology not only optimizes resource allocation among ABA lawyers but also enhances client satisfaction and trust in legal services. The project also suggests implementing automatic response systems and clear communication strategies to help clients articulate their issues more effectively, further refining the urgency assessment process.

Topics: Natural Language Processing, Data and Text Mining, Predictive Sentiment Analysis, Machine Learning, Big Data Analysis

Link to Final Report

Link to Presentation Slides

Honey Production Analysis Project

(Jun. - Aug.2022)

This project delves into the critical challenge of optimizing honey bee colony health and maximizing honey production through a detailed data-driven analysis. Utilizing datasets from the National Agricultural Statistical Services (NASS) ( Honey Production dataset & Bee Colony and Stressors dataset) and incorporating environmental and economic variables, the study aims to uncover the most effective methods for reducing bee colony losses and enhancing honey yield per colony. The research meticulously examines correlations between colony loss percentages, yield efficiencies, and various influencing factors, including climate data, state GDP, and specific stressors like Pesticides. By employing sophisticated modeling techniques, the project provides valuable insights for beekeepers and policymakers, highlighting key factors such as Varroa Mites, temperature variations, and other environmental stressors. The findings not only offer a pathway to healthier bee colonies and more efficient honey production but also contribute significantly to understanding the complex dynamics affecting these vital pollinators in the ecosystem.

Topics: Multi-Regression Modeling, Data Visualization and Mapping, Statistical Analysis, Predictive Modeling

Link to Final Report

Link to Presentation Slides