Babies Learning Language: 2026

Wednesday, April 22, 2026

Datapages for reusable (and pretty!) data sharing

[This post is joint with Mika Braginsky, a long-time collaborator on data-sharing and data-viz.]

Data sharing is both a critical scientific need and, increasingly, a mandate by many research funders. The FAIR principles – that data should be findable, accessible, interoperable, and reusable – are a critical guide to how data are shared. Yet even FAIR-compliant datasets in approved repositories are often shared in ad-hoc formats that are hard to reuse or to integrate with other data. In contrast, the most impactful datasets tend to be disseminated thoughtfully through dataset-specific or community-specific platforms. These “domain-specific data repositories” (this was our term from a previous blogpost!) create opportunities for creating data standards and ontologies that fit the needs of a particular community, research problem, or instrument type. They also allow opportunities for engagement through interactive visualizations. But custom repositories and pretty websites with nice visualizations are costly and complicated to create.

We are introducing a set of open-source tools and templates for easily creating datapages, interactive websites to disseminate data for broad reuse. Datapages are easy to deploy for a single project, but extensible enough to host large collections of related datasets. You can learn more and get started at https://datapages.github.io/.

Using AI to improve (not automate away) academic research

Everyone seems to be consumed with AI anxiety. Graduate students are wondering if they will be replaced by assistants, or if they themselves are using AI enough or using it "right". Researchers are wondering what it means to produce research if agents can write whole papers. Everyone is wondering how we will keep up with a literature that is moving ever faster.

Everyone is feeling the pressure to do *more*: do more projects, produce more papers, review more papers. This has already resulted in negative impacts on the research space, for example the problems that conferences have in getting quality, non-automated reviewing for the huge volume of submissions they receive.

We should think about what we can do that is *different.* We should try to use automation to be more efficient at the annoying parts of our jobs while leaving more time for discovering new knowledge. The key (fast-evolving, unresolved) issue is how AI models will change the frontier of what is scientifically possible. This varies from field to field and changes day by day, but my sense is that the rise of semi-autonomous agents will be very interesting for scaling up social and behavioral science.

An LLM-backed "socratic tutor" to replace reading responses

My hot take on college-level teaching is that reading responses are mostly a terrible assignment, and they're even worse in the age of AI. I'm piloting something a bit different with my co-instructor right now: a "socratic tutor" bot that asks students to answer open-ended socratic questions about a specific text and "passes" them when they show sufficient comprehension. Initial feedback from students in a first trial has been extremely positive, so I am thinking more about how this could be useful in the future, as well as some of the potential problems. LLMs are far from a panacea for education – they cause way more problems than they solve, at the moment! – but this might be an interesting use case.

As an instructor, one major challenge is that you want people to read the assigned reading and engage with it so that what you do in class can build on this content in a meaningful way; some students would prefer not to (or just don't have time, or whatever). How do you solve this problem? Weekly quizzes are possible but they're time-consuming to make and give and annoying to grade; plus they reinforce a memorization mindset, rather than inviting students to engage.

The humble reading response is a frequent alternative: you ask students to respond to, critique, or build on their readings, usually in a short response ranging from a paragraph to a page. At their best in a well-prepared seminar, the instructor reads these beforehand, synthesizes them, and calls on individual students to share their reactions. But in a larger course, often this synthesis is impossible – and so the reading response becomes an assignment that no one wants to write and that can be tedious to read at the level they deserve. Even worse, if you're not getting called out on your reaction, it's possible to "respond" to a reading without having read it. And that's even before you can ask an AI to write a response to a text that it has ingested at some point (or that you've pasted into its chat window). What do we do?

Babies Learning Language

Wednesday, April 22, 2026

Datapages for reusable (and pretty!) data sharing

Monday, April 20, 2026

Using AI to improve (not automate away) academic research

Monday, February 16, 2026

An LLM-backed "socratic tutor" to replace reading responses