
Lessons learned about benchmarking, adversarial testing, the dangers of over- and under-claiming, and AI alignment.
Transcript: https://web.stanford.edu/class/cs224u/podcast/bowman/
Sam's website
Sam on Twitter
NYU Linguistics
NYU Data Science
NYU Computer Science
Anthropic
SNLI paper: A large annotated corpus for learning natural language inference
SNLI leaderboard
FraCaS
SICK
A SICK cure for the evaluation of compositional distributional semantic models
SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment
RTE Knowledge Resources
Richard Socher
Chris Manning
Andrew Ng
Ray Kurtzweil
SQuAD
Gabor Angeli
Adina Williams
Adina Williams podcast episode
MultiNLI paper: A broad-coverage challenge corpus for sentence understanding through inference
MultiNLI leaderboards
Twitter discussion of LLMs and negation
GLUE
SuperGLUE
DecaNLP
GPT-3 paper: Language Models are Few-Shot Learners
FLAN
Winograd schema challenges
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
JSALT: General-Purpose Sentence Representation Learning
Ellie Pavlick
Ellie Pavlick podcast episode
Tal Linzen
Ian Tenney
Dipanjan Das
Yoav Goldberg
Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks
Big Bench
Upwork
Surge AI
Dynabench
Douwe Kiela
Douwe Kiela podcast episode
Ethan Perez
NYU Alignment Research Group
Eliezer Shlomo Yudkowsky
Alignment Research Center
Redwood Research
Percy Liang podcast episode
Richard Socher podcast episode
Feb 23, 2023
1 hr 26 min

AI and social science, the causal revolution in economics, predictions about the impact of AI, teaching MBAs, productizing AI, and a journey from Tel Aviv to Princeton to Stanford.
Transcript: https://web.stanford.edu/class/cs224u/podcast/goldberg/
Amir's website
Amir on Twitter
Computational Culture Lab
ChatGPT
Laura Nelson
Bart Bonikowski
Chris Winship
Bernie Koch
Treebanks
BIG-bench
Guido Imbens
Endogeneity
Susan Athey
Cambridge Analytica
Prediction Machines
Speech and Language Processing
DALL-E 2
Midjourney
Stable Diffusion
Postmodernism, or, the Cultural Logic of Late Capitalism
Turing test
Matt Salganik
Paul DiMaggio
Jan 27, 2023
1 hr 28 min

Leaving Ohio, being back in Belgium, organizing NAACL 2022, reviewing at NLP-scale, universal dependencies, and doing NLU before it was cool.
Transcript: https://web.stanford.edu/class/cs224u/podcast/demarneffe/
Marie's website
Generating Typed Dependency Parses from Phrase Structure Parses
Universal Dependencies project
OSU Linguistics
NAACL 2022
Dan Jurafsky
Dan Roth
Chris Manning
ARR
Priscilla Rasmussen
Transactions of the ACL
Finding Contradictions in Text
Not a simple yes or no: Uncertainty in indirect answers
Recognizing Textual Entailment
Anna Rafferty
Scott Grimm
"Was It Good? It Was Provocative." Learning the Meaning of Scalar Adjectives
Did It Happen? The Pragmatic Complexity of Veridicality Assessment
Yejin Choi
Yejin Choi's ACl 2022 talk
Barbara Plank
Linguistically debatable or just plain wrong?
Jesse Dodge
Reproducibility badges at NAACL 2022
Stanford Sentiment Treebank
Judith Tonhauser
Nan-Jiang Jiang
Lauri Karttunen
Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data
Microsoft DeBERTa surpasses human performance on the SuperGLUE benchmark
Daniel Zeman
Marta Recasens
Nov 7, 2022
1 hr 8 min

Coding puzzles, practices, and education, structured prediction, the culture of Hugging Face, large models, and the energy of New York.
Transcript: https://web.stanford.edu/class/cs224u/podcast/rush/
Sasha's website
Sasha on Twitter
Sasha on the Humans of AI podcast
Sasha on The Thesis Review Podcast with Sean Welleck
Sasha on the Talking Machines Podcast
Sasha interviewed by Sayak Paul
Hugging Face
PyTorch
The Annotated Transformer
The Annotated Alice
The Annotated S4
Sasha and Dan Oneață's declarative graphics library Chalk
Drawing Big Ben in Chalk
OpenNMT
Ken Shan
Blog post by Ken and Dylan Thurston
Edward Z. Yang
Stuart Shieber
Literate programming
Soumith Chintala
Lua Torch
TensorFlow
Graham Neubig
Chris Dyer
DyNet
JAX
jax.vmap
Matt Johnson
Finale Doshi-Velez, whose undergrad ML course inspired and informed Sasha's
Tensor Puzzles
GPU Puzzles
A tweet that Chris added to his CV
Adam Paszke
Dougal MacLaurin
Dex
Named Tensor notation
Named Tensors in PyTorch
TorchDim
Mini Torch
Torch-Struct
Sarah Hooker's paper 'The hardware lottery'
Jacob Andreas
Kevin Ellis
Hugging Face transformers library
Hugging Face datasets library
Hugging Face diffusers library
Hugging Face evaluate library
scikit-learn
Big Science blog
BLOOM
The Technology Behind BLOOM Training
CRFM
Eleuther
T0 and PromptSource
Washington Post: Big Tech builds AI with bad data. So scientists sought better data
The bet: Is Attention All You Need?
Democratizing access to large-scale language models with OPT-175B
Epic OPT-175 Logbook
Google's PaLM
United's shares plunge 76% on bogus bankruptcy report
Imagen
Albert Gu
Bell Labs
Oct 4, 2022
1 hr 22 min

Moving to Stanford, linguistic and social variation, interventional studies, and shared stories and lessons learned from an ACL Young Rising Star.
Transcript: https://web.stanford.edu/class/cs224u/podcast/yang/
Diyi's website
Diyi on Twitter
Dan Jurafsky
The Stanford NLP Group
Buford Highway in Atlanta
Sweet tea
VALUE paper
AAE
GLUE
Negative concord
Exploring the role of grammar and word choice in bias toward African American English (AAE) in hate speech classification
Inducing positive perspectives with text reframing
Dynabench
Datasheets for datasets
MTurk
Upwork
Prolific
Seekers, Providers, Welcomers, and Storytellers: Modeling Social Roles in Online Health Communities
ToTTo: A controlled table-to-text generation dataset
Six questions for socially aware language technologies
The importance of modeling social factors of language: Theory and practice
Dirk Hovy
Workshop on Shared Stories and Lessons Learned EMNLP 2022
Workshop on Shared Stories and Lessons Learned ICCV 2021
Jeff Hancock
Aug 1, 2022
1 hr 21 min

Birth narratives, stable static representations, NLP for everyone, AI2 and Semantic Scholar, the mission of Ukrainian Catholic University, and books books books.
Transcript: https://web.stanford.edu/class/cs224u/podcast/antoniak/
Maria's website
Maria on Twitter
Semantic Scholar
Elliott Ash
ETH Zurich Center for Law and Economics
Text As Data (TADA) 2022
David Mimno
A computational reading of a birth stories community
r/BabyBumps
Roger Shank
Nate Chambers
ICWSM 2022 workshop: BERT for Social Sciences and Humanities
Measuring Word Similarity with BERT (Sephora Makeup Reviews)
Melanie Walsh
word2vec
BERT
Nick Vincent's Twitter thread on Meta's OPT-175B filtering strategies
Stemming
Alexandra Schofield
LDA
LSA
GloVe
Evaluating the stability of embedding-based word similarities
Narrative datasets through the lenses of NLP and HCI
Belmont report
Casey Fiesler
Naive Bayes
Allen Institute
CORD-19 dataset, which appeared March 16, 2020!
Books books books
Pushkin Press
New York Review Books
Posthumous Memoirs of Brás Cubas
And Then There Were None
Stanisław Lem
Jeff VanderMeer
Italo Calvino
Jorge Luis Borges
xkcd
War and Peace
Middlemarch
Beloved
Novelist Cormac McCarthy's tips on how to write a great science paper
Blood Meridian
No Country for Old Men (book)
No Country for Old Men (movie)
The Road
Talking a visual walk through Burnt Norton
Ukrainian Catholic University
Support Ukraine Now: Real Ways You can Help Ukraine
Let Ukraine Speak: Integrating Scholarship on Ukraine into Classroom Syllabi
Ukraine Trust Chain
spilka
World Central Kitchen
Caritas Ukraine
Science for Ukraine
Data Science Crash Course: Interview Prep
Jun 27, 2022
1 hr 26 min

Realizing that Foundation Models are a big deal, scaling, why Percy founded CRFM, Stanford's position in the field, benchmarking, privacy, and CRFM's first and next 30 years.
Transcript: https://web.stanford.edu/class/cs224u/podcast/liang/
Percy's website
Percy on Twitter
CRFM
On the opportunities and risks of foundation models
ELMo: Deep contextualized word representations
BERT: Pre-training of deep bidirectional Transformers for language understanding
Sam Bowman
GPT-2
Adversarial examples for evaluating reading comprehension systems
System 1 and System 2
The Unreasonable Effectiveness of Data
Chinchilla: Training Compute-Optimal Large Language Models
GitHub Copilot
LaMDA: Language models for dialog applications
AI Test Kitchen
DALL-E 2
Richer Socher on the CS224U podcast
you.com
Chris Ré
Fei-Fei Li
Chris Manning
HAI
Rob Reich
Erik Brynjolfsson
Dan Ho
Russ Altman
Jeff Hancock
The time is now to develop community norms for the release of foundation models
Twitter Spaces event
Best practices for deploying language models
Model Cards for model reporting
Datasheets for datasets
Strathern's law
Jun 13, 2022
1 hr 27 min

From genes to memes, evidence in linguistics, central questions of computational psycholinguistics, academic publishing woes, and the benefits of urban density.
Transcript: https://web.stanford.edu/class/cs224u/podcast/levy/
Roger's website
Roger on Twitter
Roger's courses
The Selfish Gene
Joan Bresnan
John Rickford
Chris Manning
Noah Goodman
Thomas Clark
Ted Gibson
Ethan Wilcox
Critical period
Yevgeni Berzak
Heritage language
How many words do kids hear each year? See footnote 10.
W.E.I.R.D
Kristina Gulordava
Poverty of stimulus hypothesis
Formal grammar and information theory: together again?
Expectation-based syntactic comprehension
Google Ngram viewer
Google Ngram data files
Geoff Hinton's 2001 Rummelhart Prize from the Cognitive Science Society
Center embedding
Mark Johnson
Stuart Shieber
Ivan Sag
Cognitive constraints and island effects
The Chicken or the Egg? A Probabilistic Analysis of English Binomials
Sarah Bunin Benor
Roger's pinned tweet
Eric Baković
MIT's committee on the library system
Project DEAL
Diamond open access
Fernanda Ferreira
Brian Dillon
Glossa Psycholinguistics
Glossa
Johan Rooryck
La Jolla Cove
Jun 10, 2022
1 hr 28 min

Giving a TED talk, linguistic diversity, code switching and large language models, the Indian NLP scene, empowering women with language consultation work, Wordle, and "once a linguist, always a linguist".
Transcript: https://web.stanford.edu/class/cs224u/podcast/bali/
Kalika's website
Kalika on Twitter
Kalika's TED talk
Microsoft Research India
HAL
IndicBERT
AI4Bharat
mBERT
Hindi
Bangla
English
Gondi
Adivasi radio
Oriya
Karya crowdsourcing platform
Sandy Chung
Language processing experiments in the field
Tamil
Telugu
Idu Mishmi
COMPASS 2022
Digital Green
Everwell
Wordle
Information-theoretic analysis of Wordle
May 24, 2022
1 hr 29 min

Coast-to-coast professional journeys, multilingual NLP, teaching in a fast-changing field, the history of hate speech detection in NLP, ethics review of NLP research, research on sensitive topics, mentoring researchers, and optimizing for your own passions.
Transcript: https://web.stanford.edu/class/cs224u/podcast/tsvetkov/
Yulia's website
TsvetShop
Shuly Wintner
Just when I thought I was out ...
Algorithms for NLP
HMMs
Kneser–Ney smoothing
Noah Smith
Demoting racial bias in hate speech detection
The risk of racial bias in hate speech detection
Demoting racial bias in hate speech detection
Fortifying toxic speech detectors against veiled toxicity
This is the daily stormer's playbook
Microaggressions.com
Finding microaggressions in the wild: A case for locating elusive phenomena in social media posts
https://delphi.allenai.org
Delphi: Towards Machine Ethics and Norms
Yejin Choi
May 16, 2022
1 hr 23 min
Load more
