Frank (Haoyang) Ling

Master Student @ UMICH

University of Michigan

Biography

Haoyang Ling is a second-year master’s student at the University of Michigan, pursuing a degree in Information Science with a focus on Big Data Analytics. His interests include artificial intelligence, natural language processing, information retrieval, and programmable matter. With a strong foundation in both computer science and data science, he has worked on many data science-related course projects that have real-life applications with an eagerness to apply knowledge and expertise to make meaningful contributions to this field.

Interests

Artificial Intelligence and Information Retrieval
Machine Learning and Statistics
Big Data Analytics

Education

MSc in Information Science (GPA 4.0/4.0), 2024
University of Michigan
BSc in Electrical and Computer Engineering (GPA 3.92/4.0), 2023
Shanghai Jiao Tong University

Featured Work

Lizhou Fan, Wenyue Hua, Lingyao Li, Frank (Haoyang) Ling, Yongfeng Zhang

December, 2023

NPHardEval: Benchmarking Reasoning Ability of Large Language Models via Complexity Classes

NPHardEval serves as a comprehensive benchmark for assessing the reasoning abilities of large language models (LLMs) through the lens of computational complexity classes. This repository contains datasets, data generation scripts, and experimental procedures designed to evaluate LLMs in various reasoning tasks. The benchmark offers several advantages compared with current benchmarks:

Data construction grounded in the established computational complexity hierarchy
Automatic checking mechanisms
Automatic generation of datapoints
Complete focus on reasoning while exclude numerical computation

Projects

Human Brain Acitivity Detection (Ongoing)

Electroencephalography (EEG) is a non-invasive method of monitoring and recording electrical activity in the brain, which plays a crucial role in diagnosing and treating various brain-related disorders, particularly in critically ill patients. However, the manual interpretation of EEG data remains a major bottleneck in neurocritical care, as it is time-consuming, expensive, and prone to fatigue-related errors and inter-rater reliability issues. To address these challenges, there is a pressing need to develop automated methods for EEG analysis. The objective of this research is to develop a robust model trained on EEG signals to automatically detect and classify seizures and other types of harmful brain activity, aiming to assist doctors and brain researchers in providing faster and more accurate diagnoses and treatments. This work has significant implications for improving patient care in neurocritical settings, advancing epilepsy research, and supporting the development of new therapeutic interventions.

React Simple Summarizer based on ChatGPT

Simple Summarizer

It contains a React-based summarizer built with ChatGPT whose primary purpose of the application is to generate a summary of paragraphs with highlighting of the relevant keywords. It aims to facilitate the comprehension of the original text and to enhance user trust in the generated summary. I also make efforts in protecting the personally identifiable information (PII) with Presidio.

Tools: Docker, OpenAI, Presidio, ReactJS, Flask (gunicorn + gevent), Celery (Redis + MongoDB), Nginx, JMeter

React Simple Summarizer based on ChatGPT

Movie IR System

In the modern entertainment landscape, finding the perfect movie across numerous streaming platforms has become a challenging task. Current recommendation systems often fall short, relying on generic algorithms or user ratings that overlook individual preferences. Our project addresses this challenge by redefining the movie discovery experience. Unlike traditional approaches, our innovative pipeline combines exact matching with semantic understanding and refined re-ranking. This approach not only achieves a notable increase of 0.11 in nDCG@10 over the baseline BM25 but also aligns more closely with user intent and content relevance.

Diamond Prices Prediction

This project aims to build accurate predictive models for diamond prices using regression models. It is crucial for diamond sellers and buyers to determine competitive prices and make informed decisions, respectively. Factors such as carat weight, cut quality, color grade, clarity, and physical dimensions impact a diamond’s price. The project uses a dataset from Kaggle, with data on these characteristics for 53,909 diamonds. The analysis encompasses three regression models and handles multi-collinearity among predictors and prevents model overfitting. A range of techniques like visualization, exploratory data analysis, model diagnostics, and train-test split are used to ensure the models’ reliability. The selected model achieves 0.9848 in R-squared.

Machine-Generated Text Detection for ChatGPT

Discriminated statistical information of machine-generated text to propose an explainable classifier, achieving comparable predictability to BERT-based models.
Performed a thorough analysis of data augmentations on response length, identifying that truncated sentences can decrease the model performance by around 5%

Livable Residence Evaluation System

With the increasing population flooding into New York City, finding a livable neighborhood has become a significant concern for many people. However, the process of selecting an ideal living place is not straightforward. There are many factors that can affect their choices of living places, such as housing prices, crime rates, population, and education.

Yelp User Analysis

With the rise of deep learning and machine learning, many retailers adopt recommendation systems to increase their competitive ability in the market. Yelp platform has published an extensive dataset about its user and business profiles (around 9 GB). Many researchers have explored the dataset, but few of them focus on friend recommendations with users. In this project, the k-hop sub-graph or the ego-net of one specific user will be analyzed to provide diverse recommendations

Contrastive Learning on Graph Representation

Augmented graph data with random masking and self-attention after comparative analyses
to enhance the model’s robustness, surpassing baseline models in 5 out of 9 datasets.

Music Recommendation with Spark

This is a project for music recommendations with Spark.

Accelerated parallel breadth-first search with Spark 10x faster than that with MapReduce in the self-deployed cluster after compressing the dataset with a 2% ratio by Apache Avro.
Employed the PageRank algorithm within ego-nets to generate divers.

Housing Price Prediction in Illinois

This project involves working with a dataset from the Cook County Assessor’s Office in Illinois, which contains over 500,000 records describing houses sold in the area in recent years. The dataset is split into training and test sets.

In Part 1, Exploratory Data Analysis (EDA) ais performed. In Part 2, the focus is on advanced prediction with machine learning. The criterion for evaluation is L2 loss, and the baseline model is ridge regression.

Posts

Naive Search Engine

It is a naive search engine built from scratch with document preprocesssing, indexing, retrieving, and relevance evaluation based on the wikipedia pages. It is embedded with BasicInvertedIndex, BM25, Learn-to-Rank, Doc2Query, Bi-Encoder, CrossEncoder, and NDCG.

Frank (Haoyang) Ling

Nov 7, 2023 2 min read

Hadoop Cluster

It is a lab related to build hadoop cluster with Spark, Drill, and Zookeeper. I implement it with docker-compose. It is published in frankling2021/shadrik . It aims at studying the basics of big data. People can dive into it to explore more things.

Frank (Haoyang) Ling

Last updated on Dec 13, 2020 1 min read

Luna Tennis Club

Request: I am an instructor in a newly opened tennis club LUNA. Many students come to learn some basic tennis skills and desire to have personalized training. Professional tennis matches will be good examples for my students to understand them. So, could you help me by providing some practical suggestions to my students?

Frank (Haoyang) Ling

Last updated on Sep 5, 2019 8 min read

Experience

Graduate Student Instructor

School of Information, University of Michigan

August 2023 – Present Michigan

Responsibilities include:

SI 671 Data Mining Discussion

Research Intern

Michigan Traffic Lab

April 2023 – August 2023 Michigan

Responsibilities include:

Deploying the McityGPT with LangChain to introduce Mcity to newcomers with useful information from the website.
Understanding the basics of importance sampling and natural driving enviornment.
Researching on the autonomous vehicle safety test with Traci sumo by modelling the traffic accidents in the natural adversarial driving environment (NADE).
Calibrating the distribution of traffic accidents with experiments in Great Lakes.

Research Assistant

Shanghai Jiao Tong University

March 2022 – August 2022 Shanghai

Responsibilities include:

Understood the basics of GNN including GCN and GAT and the usage of PyTorch and PyTorch Geometric.
Compared GNN with Computer Vision and Natural Language Processing in data augmentation and searched for similarities.
Investigated contrastive learning methods in graph representation learning and fine-tuned models with random masking and attention masking.

Teaching Assistant

UM-SJTU Joint Institute

May 2021 – December 2021 Shanghai

Responsibilities include:

Taught Physics and Circuit theory by providing RC and OH for around 150 students.
Scored homeworks and exams and prepared exam questions.
Obtained praise as a teaching assistant.

Recent Work

Quickly discover relevant content by filtering publications.

Lizhou Fan, Wenyue Hua, Lingyao Li, Frank (Haoyang) Ling, Yongfeng Zhang (2023). NPHardEval: Benchmarking Reasoning Ability of Large Language Models via Complexity Classes.

PDF Code Project

Accomplishments

Advanced Computer Vision with TensorFlow

Coursera Jan 2022 – Jan 2022

See certificate

Neural Networks and Deep Learning

Coursera Aug 2021 – Aug 2021

See certificate

Contact

My research interests include artificial intelligence, information retrieval, and programmable matter. If you are interested in working together or have any questions, please feel free to contact me using the information below. I would love to hear about any opportunities that may be a good fit for my skills and experience.

carofrank2000@gmail.com

Frank (Haoyang) Ling

Master Student @ UMICH

University of Michigan

Biography

Featured Work

Projects

Simple Summarizer

Posts

Experience

Recent Work

Accomplish­ments

Contact

Accomplishments