July 31, 2024
By Patricia Waldron
Eighteen students from across the country got a crash course in computer vision, machine learning, and how to secure a research career in tech during the 2024 SoNIC Summer Research Workshop, held June 24-28 on Cornell’s Ithaca campus.
SoNIC participants developed models to help people with impaired vision to identify objects around them and to allow citizen scientists to identify bird species more accurately, in collaboration with the Cornell Lab of Ornithology.
Founded in 2010 and hosted by the Cornell Ann S. Bowers College of Computing and Information Science, SoNIC is an annual workshop designed to increase the number of underrepresented minorities in higher-level computer science programs. Students from colleges and universities nationwide gather at Cornell to gain hands-on research experience and to learn more about pursuing advanced degrees and research careers in tech. Any student can apply and no previous research experience is needed, but students must be majoring in a STEM field.
"SoNIC is not just about algorithms and running models. It's about empowerment and opportunity," said LeeAnn Roberts, director of the Cornell Bowers CIS Office of Diversity, Equity, Inclusion, and Belonging, and SoNIC’s lead organizer. "Our summer program aims to equip these students with the confidence, knowledge, and connections to begin forging successful careers in academia and the tech industry."
This year's class delved into computer vision, a type of artificial intelligence that uses machine learning and neural networks to analyze images, videos, and other visual inputs. Specifically, they found new applications for the Large Language and Vision Assistant (LLaVA), a multimodal language model that processes text and images.
"This is a relatively new development," said Bharath Hariharan, assistant professor of computer science and the faculty lead for this year's workshop. Unlike the large language model ChatGPT, which is text-based, multimodal language models like Google's Gemini, “can actually understand both images and text, so, you can give them an image and ask a question,” he said
On the first day, students stress-tested LLaVA to find its weak spots. It struggles to identify blurry images, can't count, and is prone to hallucination. For the rest of the week, students worked in small groups to improve the model's results for a range of applications.
Participants also heard from various faculty about their research and got insider tips on how to apply to graduate school.
Kayla Hom, who recently graduated from the University of California, Davis with a degree in computer science and engineering, worked with her group to apply different image manipulation techniques to blurry images to see if they could get more accurate results from LLaVA.
“This experience at SoNIC gave me a wider view of everything that you can do with research, and especially in the different fields of computer science,” Hom said. “I was able to learn a lot about the different faculty here at Cornell. It interested me a lot to hear about their different research because I'm still deciding what I want to do.”
This fall, Hom is planning to pursue a master's degree in computer science at the University of California, San Diego with an emphasis in human-computer interaction. She also hopes to get involved with DEI initiatives to expose more women and people of color to computer science.
As part of the workshop, the group took a field trip to the Cornell Lab of Ornithology to learn more about its popular Merlin Bird ID app. Merlin is one of the most successful applications of computer vision for citizen science – it uses computer vision techniques to identify birds based on photos or even captured audio.
Inspired by the app, one group attempted to reduce erroneous results from LLAVA by merging it with a second model, BioCLIP, which identifies images of organisms using information from the universal tree of life.
“One of the problems we were having with LLaVA is that it tends to hallucinate. Sometimes it'll miscategorize an image or it'll latch onto the prompt, and it'll give you results that aren't really there," said Paul Santana, a computer engineering student from the University of Texas at Arlington. "We introduced BioCLIP to try and help reduce those errors and see if we can get more accurate results when trying to identify animals or any kind of wildlife.”
The experience added machine learning to Santana's list of possible research interests, which he hopes to pursue through a Ph.D. program or in industry.
Students also had the opportunity to experience Ithaca and get to know their fellow participants. SoNIC included a canoeing trip around Beebe Lake and ziplining at the Cornell Challenge Course.
“I've really enjoyed both learning about stuff in class, and also the events at night where we get to hang out,” said Morgan Cobb, a computer engineering major at the University of Florida, who plans to earn a Ph.D. in computer science. She said the program taught her a lot about graduate school and even possibilities beyond computer science, such as information science.
“We're all engineers. We all are able to code programs, but the idea of it being interdisciplinary, I thought was really cool," Cobb said.
Funding for SoNIC comes, in part, from the Hopper-Dean Foundation, the National Science Foundation, and LinkedIn.
Patricia Waldron is a writer for the Cornell Ann S. Bowers College of Computing and Information Science.