Frank (Haoyang) Ling

Frank (Haoyang) Ling

Master Student @ UMICH

University of Michigan

Biography

Haoyang Ling is a second-year master’s student at the University of Michigan, pursuing a degree in Information Science with a focus on Big Data Analytics. His interests include artificial intelligence, natural language processing, information retrieval, and programmable matter. With a strong foundation in both computer science and data science, he has worked on many data science-related course projects that have real-life applications with an eagerness to apply knowledge and expertise to make meaningful contributions to this field.

Interests
  • Artificial Intelligence and Information Retrieval
  • Machine Learning and Statistics
  • Big Data Analytics
Education
  • MSc in Information Science (GPA 4.0/4.0), 2024

    University of Michigan

  • BSc in Electrical and Computer Engineering (GPA 3.92/4.0), 2023

    Shanghai Jiao Tong University

Projects

*
Human Brain Acitivity Detection (Ongoing)
Electroencephalography (EEG) is a non-invasive method of monitoring and recording electrical activity in the brain, which plays a crucial role in diagnosing and treating various brain-related disorders, particularly in critically ill patients. However, the manual interpretation of EEG data remains a major bottleneck in neurocritical care, as it is time-consuming, expensive, and prone to fatigue-related errors and inter-rater reliability issues. To address these challenges, there is a pressing need to develop automated methods for EEG analysis. The objective of this research is to develop a robust model trained on EEG signals to automatically detect and classify seizures and other types of harmful brain activity, aiming to assist doctors and brain researchers in providing faster and more accurate diagnoses and treatments. This work has significant implications for improving patient care in neurocritical settings, advancing epilepsy research, and supporting the development of new therapeutic interventions.
Human Brain Acitivity Detection (Ongoing)
React Simple Summarizer based on ChatGPT

Simple Summarizer

It contains a React-based summarizer built with ChatGPT whose primary purpose of the application is to generate a summary of paragraphs with highlighting of the relevant keywords. It aims to facilitate the comprehension of the original text and to enhance user trust in the generated summary. I also make efforts in protecting the personally identifiable information (PII) with Presidio.

  • Tools: Docker, OpenAI, Presidio, ReactJS, Flask (gunicorn + gevent), Celery (Redis + MongoDB), Nginx, JMeter
React Simple Summarizer based on ChatGPT
Movie IR System
In the modern entertainment landscape, finding the perfect movie across numerous streaming platforms has become a challenging task. Current recommendation systems often fall short, relying on generic algorithms or user ratings that overlook individual preferences. Our project addresses this challenge by redefining the movie discovery experience. Unlike traditional approaches, our innovative pipeline combines exact matching with semantic understanding and refined re-ranking. This approach not only achieves a notable increase of 0.11 in nDCG@10 over the baseline BM25 but also aligns more closely with user intent and content relevance.
Movie IR System
Diamond Prices Prediction
This project aims to build accurate predictive models for diamond prices using regression models. It is crucial for diamond sellers and buyers to determine competitive prices and make informed decisions, respectively. Factors such as carat weight, cut quality, color grade, clarity, and physical dimensions impact a diamond’s price. The project uses a dataset from Kaggle, with data on these characteristics for 53,909 diamonds. The analysis encompasses three regression models and handles multi-collinearity among predictors and prevents model overfitting. A range of techniques like visualization, exploratory data analysis, model diagnostics, and train-test split are used to ensure the models’ reliability. The selected model achieves 0.9848 in R-squared.
Diamond Prices Prediction
Machine-Generated Text Detection for ChatGPT
  • Discriminated statistical information of machine-generated text to propose an explainable classifier, achieving comparable predictability to BERT-based models.
  • Performed a thorough analysis of data augmentations on response length, identifying that truncated sentences can decrease the model performance by around 5%
Machine-Generated Text Detection for ChatGPT
Livable Residence Evaluation System
With the increasing population flooding into New York City, finding a livable neighborhood has become a significant concern for many people. However, the process of selecting an ideal living place is not straightforward. There are many factors that can affect their choices of living places, such as housing prices, crime rates, population, and education.
Livable Residence Evaluation System
Yelp User Analysis
With the rise of deep learning and machine learning, many retailers adopt recommendation systems to increase their competitive ability in the market. Yelp platform has published an extensive dataset about its user and business profiles (around 9 GB). Many researchers have explored the dataset, but few of them focus on friend recommendations with users. In this project, the k-hop sub-graph or the ego-net of one specific user will be analyzed to provide diverse recommendations
Yelp User Analysis
Contrastive Learning on Graph Representation

Augmented graph data with random masking and self-attention after comparative analyses
to enhance the model’s robustness, surpassing baseline models in 5 out of 9 datasets.

Contrastive Learning on Graph Representation
Music Recommendation with Spark

This is a project for music recommendations with Spark.

  • Accelerated parallel breadth-first search with Spark 10x faster than that with MapReduce in the self-deployed cluster after compressing the dataset with a 2% ratio by Apache Avro.
  • Employed the PageRank algorithm within ego-nets to generate divers.
Music Recommendation with Spark
Housing Price Prediction in Illinois

This project involves working with a dataset from the Cook County Assessor’s Office in Illinois, which contains over 500,000 records describing houses sold in the area in recent years. The dataset is split into training and test sets.

In Part 1, Exploratory Data Analysis (EDA) ais performed. In Part 2, the focus is on advanced prediction with machine learning. The criterion for evaluation is L2 loss, and the baseline model is ridge regression.

Housing Price Prediction in Illinois

Experience

 
 
 
 
 
School of Information, University of Michigan
Graduate Student Instructor
August 2023 – Present Michigan

Responsibilities include:

  • SI 671 Data Mining Discussion
 
 
 
 
 
Michigan Traffic Lab
Research Intern
April 2023 – August 2023 Michigan

Responsibilities include:

  • Deploying the McityGPT with LangChain to introduce Mcity to newcomers with useful information from the website.
  • Understanding the basics of importance sampling and natural driving enviornment.
  • Researching on the autonomous vehicle safety test with Traci sumo by modelling the traffic accidents in the natural adversarial driving environment (NADE).
  • Calibrating the distribution of traffic accidents with experiments in Great Lakes.
 
 
 
 
 
Shanghai Jiao Tong University
Research Assistant
Shanghai Jiao Tong University
March 2022 – August 2022 Shanghai

Responsibilities include:

  • Understood the basics of GNN including GCN and GAT and the usage of PyTorch and PyTorch Geometric.
  • Compared GNN with Computer Vision and Natural Language Processing in data augmentation and searched for similarities.
  • Investigated contrastive learning methods in graph representation learning and fine-tuned models with random masking and attention masking.
 
 
 
 
 
UM-SJTU Joint Institute
Teaching Assistant
May 2021 – December 2021 Shanghai

Responsibilities include:

  • Taught Physics and Circuit theory by providing RC and OH for around 150 students.
  • Scored homeworks and exams and prepared exam questions.
  • Obtained praise as a teaching assistant.

Recent Work

Quickly discover relevant content by filtering publications.
(2023). NPHardEval: Benchmarking Reasoning Ability of Large Language Models via Complexity Classes.

PDF Code Project

Contact

My research interests include artificial intelligence, information retrieval, and programmable matter. If you are interested in working together or have any questions, please feel free to contact me using the information below. I would love to hear about any opportunities that may be a good fit for my skills and experience.