CaRDS seminar:
Graph Neural Networks and Transformers: Positional Encodings as Node Embeddings
Bright Kwaku Manu, Department of Mathematics and Statistics, ETSU
Graph Neural Networks and Transformers are very powerful frameworks for learning machine learning tasks. For example, Transformers are the machine learning algorithms used for GPT Large Language Models (the “T” is for “Transformers”). While they were developed separately in diverse fields, current research has revealed that they have certain similarities and links. In this presentation, we focus on bridging the gap between GNNs and Transformers by offering a uniform framework that highlights their similarities and distinctions. In doing this, we perform positional encodings and identify some key properties that make the positional encodings interpretable as node embeddings, including expressiveness, efficiency, and interpretability. We show that it is possible to use positional encoding (which is a component of a Transformer network) as a node embedding (which is a component of a graph neural network) such that these encodings exhibit properties that make it almost the same as node embeddings and can be used for machine learning tasks such as node classification, graph classification and link prediction. We discuss some challenges and provide future directions.
When? Thursday, November 30 at 2.00 pm
Where? 306 Gilbreath Hall
CaRDS seminar:
Understanding Missing Data: Mechanisms, Descriptions, and Patterns
Dr. Mostafa Zahed, Department of Mathematics and Statistics, ETSU
Effective handling of missing data is crucial for robust statistical analysis. This presentation aims to provide a comprehensive overview of missing data analysis, encompassing the intricacies of missing data mechanisms, descriptions, and patterns. The discussion will delve into the fundamental concepts, methods, and techniques associated with addressing missing data in statistical analyses. The presentation will begin with an exploration of the different missing data mechanisms, including Missing Completely at Random (MCAR), Missing at Random (MAR), and Missing Not at Random (MNAR). An emphasis will be placed on understanding the implications of these mechanisms for data analysis and the strategies to account for each type of missingness. Furthermore, the discussion will cover the vital aspects of describing missingness in data, highlighting key measures such as Amount Missing, Percent Complete, and Percent Observed. The significance of these descriptors in evaluating the extent of missing data within a dataset will be emphasized. In addition, the presentation will delve into the examination of various missing data patterns, including Pattern Value Reporting, Dot Chart, Data Matrix Plot, and Aggregate Plot. The importance of identifying and interpreting these patterns for effective data handling and interpretation will be underscored. Finally, the session will touch upon the concept of Linkage Pattern Statistics, encompassing essential metrics such as Proportion Usable Cases, Influx Coefficient, Outflux Coefficient, and Fluxplot. These metrics serve as valuable tools for understanding the flow and patterns of missing data within a dataset, enabling a more comprehensive analysis of missing data mechanisms.
When? Thursday, November 16 at 2.00 pm
Where? Gilbreath Hall, Room 304
CaRDS seminar:
Visualizing Critical Health Data: A Collaboration between Public Health and Computer Sciences
Randy Wykoff, Dean, College of Public Health
Phil Pfeiffer, Department of Computing
East Tennessee State University
Making informed decisions about public health challenges requires access to timely and accurate data. Many existing data sets report county-level data, but only provide comparisons between counties within the same state. Knowing, for example, that the people of Washington County, Tennessee, have the 21st longest life expectancies of any of Tennessee’s 95 counties (top quartile) might lead one to believe that the county is relatively “healthy.” However, when you compare Washington County, Tennessee, with the other 3,141 counties in the United States, you see that it is worse than over two-thirds of the counties in the United States—just outside the bottom quartile, nationally. The College of Public Health worked with faculty and students from the Department of Computing to develop a visual display that graphically demonstrates where any county (or any group of counties) places, compared to all other counties in the country.
When? Thursday, October 19 at 2.00 pm
Where? D.P.Culp University Center, Room 311 (“The Forum”)
First CaRDS seminar of the year 23/24:
The AI Tennessee Initiative
Lynne E. Parker, Director, AI Tennessee Initiative, University of Tennessee, Knoxville
The AI Tennessee Initiative is a new research and education initiative led by the University of Tennessee, Knoxville, with the goal of enabling the State of Tennessee to become a leader in the data-driven knowledge economy. Working with academic, industry, non-profit, and community organizations across the state, this initiative is focused on the State of Tennessee’s unique AI strengths and opportunities. The initiative is strongly transdisciplinary and aims to leverage the benefits of AI across all disciplines and economic sectors, including smart manufacturing, climate-smart agriculture and forestry, precision health and environment, future mobility, and AI for science. This talk will overview the AI Tennessee Initiative and actions to date, with a goal of spurring conversation on how ETSU can engage, contribute, collaborate, and benefit from this effort.
Where? D.P.Culp University Center, Room 311
When? Thursday, September 14 at 2.00 pm
News from the South Data Hub, along with recent programs and projects of relevance for CaRDS:
The Data Sharing and Cyberinfrastructure Working Group is a collaboration across all
four NSF Big Data Hubs that invites presenters from across the nation to share and
receive feedback on the latest technology. The group works to increase awareness and
availability of CI and big data innovations for the national community. The group
also hopes to identify gaps in CI, capabilities, and data sources to foster further
improvement to the national data research infrastructure, content, and capabilities.
https://southbigdatahub.org/newsblog/data-and-cyberinfrastructure-working-group
DataBytes: AI Ethics Through the Lens of Causality — A Theory of Fairness | Virtual
| June 20 at 4 PM ET
To understand fairness, one must unify central ideas from the social sciences and humanities to mathematics and computer science. Join Christopher Lam, CEO of Epistamai, as he shows how to model a principal cause of algorithmic bias and directly map it to the two fundamental laws of causal inference. Additionally, he will show how to bridge the field of causal inference to machine learning, providing us with a novel way to visualize the different ways that a supervised machine learning model can discriminate. These causal models may help policymakers on both sides of the aisle to modernize AI regulations so that they are aligned to society’s values. Learn more on their website.
PEARC 2023 | Portland, OR | July 23 - 27
Registration is open for ACM PEARC23, which will be held from July 23 to July 27, 2023 at the prestigious Oregon Convention Center in Portland, Oregon. The conference theme will be “Computing for the Common Good" and will provide a forum for discussing challenges, opportunities, and solutions among the broad range of participants in the research computing community. Technical tracks include Applications and Software, Systems and Systems Software, and Workforce Development, Training, Diversity, and Education. Learn more on their website.
Data Matters
The annual Data Matters short-course series, a week-long series of one or two-day courses aimed at students and professionals in business, research, and government, will return August 7 - 11, 2023 in partnership with RENCI and the Odum Institute. If possible, please circulate this opportunity with your staff, faculty, and students, and share the information with relevant internal listservs and newsletters. Additionally, if you are interested in promoting this opportunity on social media, please use the hashtag #DataMatters and feel free tag us @TheNCDS on Twitter or The National Consortium for Data Science on LinkedIn.
Smoky Mountain Conference 2023: Data Challenge
ORNL announces this year's Smoky Mountain Conference Data Challenge. The registration is open until June 20. The SMC Data Challenge 2023 consists of
six data analytics challenges based on data sets provided by ORNL. Researchers at
all career stages, including students, are encouraged to participate. Winners will
be given a DOI. Selected teams will be invited to the conference, give a 3-minute lightning talk
and present a poster describing their solution at SMC2023