I Couldn't Continue Marking AI-Generated Essays

Trying to turn an AI crisis into an opportunity by building a conversational assessment platform: and what 140 students had to say about it

The Problem: An Existential Crisis in Higher Education?

"We are facing an existential crisis." You may have read Stanford's vice provost for academic affairs, and others, raising the alarm since ChatGPT arrived in late 2022. For those of us in the trenches, marking essays, running seminars, watching students struggle to articulate what they've supposedly learned—it really does make you think "what's the point!"

I've been teaching social work at Ulster University since 2017. Like most academics, I've spent my time refining my assessment methods, developing essay prompts that push students to think critically, and providing detailed feedback that I hoped would transform them into competent professionals. Then, practically overnight, that entire pedagogical approach was undermined.

A 2024 survey found that 43% of college students have used ChatGPT or similar AI tools, with 89% using it for homework and 53% specifically for essays. A 2025 survey found 88% of UK students were using generative AI for assessments. As one professor was quoted last year: student use of AI has "gone totally mainstream this year". I did my own little back-of-a-cigarette-packet analysis. I compared the grammar mistakes and choice of font used in my 2024 essay submissions with submissions I was getting in 2022. There was a stark difference. No grammar mistakes at all in 8 out of 10 of the later submissions and the font they mostly used, Aptos, was the same one churned out by ChatGPT, but this wasn't used at all back in 2022.

But it's not just the prevalence, it's what it represents. As one philosophy professor wrote recently, "I once believed my students and I were in this together, engaged in a shared intellectual pursuit. That faith has been obliterated over the past few semesters." That resonates. When you're marking an essay and you recognise the telltale signs—the word "delve" appearing three times, bullet points summarising themes that were never discussed in class, a jarring conversational style—aaaaaaaargh, something breaks!

It's the students who cheat who need the learning most. As that same professor noted, the essay isn't the point—"the transformation of the author into an educated person" is the point. And that transformation requires struggle, requires the frustration of organising thoughts, requires the cognitive effort of turning ideas into coherent prose. ChatGPT short-circuits all of that.

For social work education, the stakes are particularly high. My students will sit across from vulnerable people in crisis: families on the edge, children at risk, individuals contemplating suicide. The ethical dilemmas they'll face won't arrive "neatly packaged like an essay prompt." They won't have time to feed the situation into an AI. They need to have genuinely internalised ethical reasoning, communication skills, and professional judgment. You cannot fake your way through a safeguarding conversation.

I found myself at a crossroads. One path: an arms race creating AI-resistant assignments, detection tools, hand-written blue book exams. That felt like a losing battle. Australia's higher education regulator had already concluded that AI-assisted cheating is "all but impossible" to detect consistently. Detection tools were flagging innocent students, particularly non-native English speakers, while missing sophisticated cheaters.

The other path? Exams—but most Unis, including mine, want very few exams. They are resource intensive, students don't like them, and cramming for exams does not lead to sustained learning. What about just talking to the students, individually, about what they have learned? They would have to know their stuff, there would be no hiding place. You could keep their anxiety levels low if you let them practice a lot and you keep it friendly! I had tried it before and students had liked it. But it took more time from me and my colleagues, and we don't really have any more time. Things are tight in UK/Irish Unis at the moment.

A Journey: From Frustration to Vibe-Coding

I'd been experimenting with conversational assessment before ChatGPT made it urgent.

In 2023, my colleagues and I published research on using peer-led learning and assessed conversations to teach evidence-based practice. The idea: social work is fundamentally a talking profession. We assess clients through conversation. We build relationships through conversation. We conduct interventions through conversation. So why were we assessing students almost exclusively through written work?

Students initially resisted. They argued that "a conversation is a very unpredictable thing" and lobbied for traditional presentations instead. But by week four of the semester, after practising mock conversations, something shifted. They began to see that articulating knowledge verbally, in real-time, was a different skill from writing about it. And it was a skill that would actually matter in practice.

Our study concluded that we needed to review assessment methods "in line with contemporary challenges"—particularly the rise of AI and essay mills. That was prescient, though I didn't fully appreciate how quickly those challenges would intensify.

I'm not suggesting that talking-based assessments have not been done before—yes we already have oral assessments, and role-play assessments. I think what I am bringing to the party here is the know-how to set these assessment tasks and mark them without adding more work to the lecturers' inbox.

Assessed conversations were brilliant in theory but exhausting in practice. Every conversation required my physical presence, playing the role of a client, asking probing questions, assessing responses in real time. Consistency was difficult to maintain. Scaling to larger cohorts seemed impossible. I needed extra staff, extra hours, extra everything.

In early 2025, I started experimenting. I tried various conversational AI platforms, anything I could get access to. They were very clunky, and couldn't maintain a realistic conversational scenario. In April 2025, working with an independent programmer, I put together a rudimentary voice-based agent. And it survived some testing, you could call it proof of concept. The second half of 2025 brought more advanced programming support and I immersed myself in vibe-coding, and eventually I had something that might actually work: a platform built on industry-leading large language models, capable of conducting realistic practice conversations with students, recording the audio, generating transcripts, and making everything available for assessment.

My big worry then was: would students actually accept it?

The Experiment: 140 Students, High Stakes, Real Results

In December 2025, I deployed the platform with 140 final-year social work students for a module on social work interventions.

This was a bit terrifying. Final-year students are not guinea pigs; they're weeks away from entering practice. They've been through the university system long enough to know when something isn't working. They don't hold back with negative feedback when something goes awry.

But I ploughed on. I linked students to the platform, and they accessed an AI agent that simulated a practice tutor. A practice tutor who wanted to know what they had learned from particular lectures that semester. They could pick one of three conversational scenarios I had built for them. They could do their conversation three times. They were to then pick the best one, the best recording that is, and upload it to the same place they usually uploaded essays.

The transcripts and audio from these conversations were then available for my colleagues and I to review and to mark. When it was all done we surveyed the cohort about their experience.

59 students responded to our survey—a 49% response rate, gathered before results were released so it reflects honest, unfiltered views. The survey asked students to rate and rank all of the assessment methods used that semester, giving us direct comparison data.

Students rated their preference on a 0–100 scale (0 = totally prefer AI conversations, 100 = totally prefer the alternative). The results were clear: 68% of students preferred AI conversations to written assignments, and 78% preferred them over recorded presentations.

When asked which methods helped them build skills for practice, AI conversations ranked first—43% placed them as their top choice, with an average rank of 2.09 out of 4. Written assignments ranked last, with only 4% placing them first for practice-skill development.

The picture was slightly different for learning reinforcement, where multiple choice tests came out on top. Students appear to perceive structured testing as best for consolidating knowledge, with AI conversations a clear second. But for the skills that actually matter in professional practice—communicating, reasoning, responding in real time—conversations won outright.

Students' qualitative feedback offered deeper insights.

What Students Actually Said

"I strongly believe assessed conversations are the way forward. It allows students who struggle to get their point across in written assignments show off their knowledge."

That comment crystallised something I'd sensed but couldn't articulate. We have students who understand the material deeply but freeze when they have to write about it. And we have students who write beautifully but can't actually apply what they've learned in a conversation. Which skill matters more for a social worker?

"Far better way of learning as some people are better at writing essays regardless of whether they know the topic or not."

There it is again—the recognition that writing proficiency and professional competence are not the same thing. We've been using one as a proxy for the other, and AI has exposed how flawed that proxy always was.

"I think the conversations are a great way to build AI into the Degree, so students can begin to feel like professionals."

This student got it. We're not just assessing knowledge; we're building professional identity. When you successfully conduct a motivational interviewing conversation—even with a simulated client—you start to feel like someone who could do this for real.

"Love the use of AI—less pressure. I like that it's not endless essays for every subject."

"The platform worked well for me... I really feel that this is an excellent method to measure social work skills and should be considered for years coming after us."

There were critiques too, and they were valuable. Some students wanted more time—they felt the AI's response time shouldn't count against their conversation duration. Others wanted more practice opportunities (they were allowed 2 practice sessions) before the assessment. One suggested case studies from particular focus areas. These are all solvable problems.

But what struck me most was an unexpected finding: several students reported that it was actually easier to talk to an AI simulation than to a person.

I hadn't anticipated that. Role plays with lecturers or peers carry social stakes—you're worried about looking foolish, about forgetting what to say, about the other person's judgment. The AI removed that anxiety. Students could focus entirely on the conversation itself, on demonstrating their knowledge and skills, without the social overhead. I should point out that I allowed students to do their conversation 3 times and then pick the best recording to submit, I think this really helped reduce assessment anxiety.

Students described the experience in terms I hadn't expected:

"The positive encouragement throughout the conversation helped to keep conversation going without creating anxiety or stress. Having 3 attempts allowed for a first attempt to familiarise with the way the questions would be asked."

"What worked well was that it was open book, as a social worker you have notes with you when presenting a case or at a meeting. So it felt like a professional conversation and you felt more supported throughout."

"It felt really natural—it didn't feel awkward."

One colleague who helped mark the conversations put it simply: "I enjoyed marking the conversations and I can't say I have enjoyed marking more than a couple of essays in recent years due to students' use of AI."

What I've Learned

This experiment confirmed something I'd suspected but hadn't been willing to say out loud: I'm not sure I could continue as a lecturer if the only option was receiving AI-generated essays.

The relentless parade of polished prose hiding empty understanding. The detective work required to spot cheaters. The erosion of trust. The nagging doubt that accompanies every submission: did they write this? Did they learn anything?

Conversational assessment offers a way forward. It's not AI-proof in some absolute sense—nothing is. But it's AI-appropriate. It uses the same technology that created the problem to address it. Students can't outsource a real-time conversation to ChatGPT. They have to show up, think on their feet, and demonstrate what they actually know.

More importantly, it assesses what actually matters. Social work isn't about writing essays; it's about talking to people in difficulty. Medical training is about communicating with patients under pressure as much as it is about anything. Language learning isn't about grammar worksheets; it's about holding a conversation in the target language. It's hard to think of a professional environment where verbal communication isn't essential—conversational assessment makes sense regardless of the AI question.

The platform I've built—now available at hied.ai—emerged from this journey. I didn't set out to become an EdTech entrepreneur. I set out to solve a problem in my own classroom. But the more I developed it, the more I realised that thousands of other educators were facing exactly the same crisis.

Universities are in an existential struggle. The essay, that cornerstone of higher education assessment for decades, may not survive the AI era. We need alternatives. We need to rethink what we're actually trying to assess and whether our methods match our goals.

For me, the answer turned out to be remarkably simple: have the students talk.

Tony McGinn is a Senior Lecturer in Social Work at Ulster University and founder of HiEd.ai. His research interests include evidence-based practice, domestic violence interventions, and the leverage of technology for more effective student learning environments. He can be reached at tony@hied.ai.

References

McGinn, T., Pascoe, K.M., & Burns, P. (2024). Teaching social work students about evidence-based practice using peer-led learning and assessed conversations. Social Work Education, 44(6), 1519-1534.

The Problem: An Existential Crisis in Higher Education?

A Journey: From Frustration to Vibe-Coding

The Experiment: 140 Students, High Stakes, Real Results

What Students Actually Said

What I've Learned

About the Author