AI: The Helpful Assistant
Why experts are still needed for solving real-world problems: a case study
In the early days of the identity and access management (IAM) market, I often quipped that every new user provisioning product was particularly great at solving “the first mile problem.” As a Burton Group analyst, I saw so many demos of a happy-path workflow that could take an input from an HR system in CSV format and then generate user accounts in the corporate directory, email service, and a RADIUS server (yeah, people really used phone lines to access your network back then). Impressive! But while they were really good at the easy stuff, the harder problems took much more effort to solve.
Lately, I’ve observed how every newly-minted frontier model and their agentic counterparts (such as Anthropic’s MCP or DeepSeek-adjacent Manus) are solving the mid-mile problem. These services are very helpful and eager to please—it really is quite impressive to see. The difficulty comes with the last few miles of making something work. Today’s models are excellent tools if you already know what you’re doing when it comes to the task at hand. Experience and knowledge allow users to tease the help they need out of the model with good prompts and push the model in the right direction when it gets something wrong. But for beginners, today’s models are not the magic wands that vendors would have you believe. Let me provide a simple example: creating an email signature for Apple Mail.
Note: I have heftier use cases coming, so make sure to subscribe.
A simple case study
Because we’re just getting started at The Focus Group, I needed to create an email signature for our team. I’ve performed this task recently so I know how tricky it can be. Simply put, Apple Mail is terrible at handling HTML-based email signatures. ChatGPT 4o provided an excellent summary of the situation, albeit only after I prompted it to do so:
Here’s how DeepSeek put it:
The solution is unabashedly convoluted. Were the models helpful? Absolutely. But did they magically complete the task without my intervention? Absolutely not. Because I have some expertise with HTML coding, I was able to complete the task. But without that expertise, I would have been unable to accomplish my goal. I was able to catch mistakes the models made, push them in specific directions, and even ended up taking a step none of them suggested.
But I got there, and here’s what the final product looks like in Apple Mail’s signature editor:
And this is what the signature looks like when used in a message:
The Conclusion
Although not an enterprise use case, this simple example shows where and how models fall short in their ability to offer expert assistance.
You can get nearly all the help you need from frontier models, but only if you already know what you need and have a good understanding of what you’re trying to do.
Models perform any task much better when you have experience both with prompting and the model in use. Notably, all models have system prompts, which instruct them to provide answers in a helpful manner. It’s a good idea to understand the system prompt parameters for the models you use, because the models are eager to please and they tend to say what the model developer classifies as what the audience wants to hear. In short, you’ll need to state in your prompts what level of depth you’d like to receive for the task. Models also assume that your use case is the general-purpose case, so you’ll need to be specific in your prompts about how your use cases are special.
Even so, while LLMs are well trained on available sources, they are also clearly not practitioners. For example, the best method I discovered for creating an HTML signature for Apple Mail was never offered by any of the models.
This simple use case is a harbinger of what enterprises should expect on a larger scale of today’s generative AI solutions. Critical enterprise use cases continue to require domain experts. Models are trained on generalities and don’t do well with nuanced, custom, or mission critical operations. In the hands of experts, models can save tremendous time and improve the accuracy of implementing use cases. But success to the last mile absolutely requires the expertise of practitioners.
The Details
We like to show our work, so if you want the details, here was my initial prompt (note: I tend to refer to the app as MacMail, but no models were confused by this—no offers for “fries with that?” so all good!)
For this exercise I didn’t use any form of deep research, as I figured that wouldn’t affect the outcome appreciably. Also, for this exercise, I decided to take the approach the model recommended, even when I knew it to be misguided. Then, anytime I ran into trouble, I prompted the model for help to see how it would respond.
Stuff that Went Well
I was delighted to see how these models immediately jumped into helper mode by providing overall approach, offering templates, and prompting me for the details they need to generate an actual HTML signature. Honestly, I was truly impressed. On any other mail client, this would have worked effortlessly. So let’s start with how well these models performed at helping.
The latest models act as helpers rather than lecturers
Earlier models tended to provide a lot of advice and counseling, but really had nothing to offer in the way of direct help. They all basically spent one or two sentences saying something to the effect of “that’s a great idea, let’s get started,” and then began asking questions, laying out a course of action and offering templates. Here’s an example from Grok:
It’s clear that models are capable of helping and helpful behavior is reinforced in system prompts.
Native AI performed better than Canva, Hubspot, and Gmail
Even though models pointed out—even suggested—that creating and editing email signatures can be done on sites such as Canva or Hubspot, I found that just staying with chat was simpler and more effective, for several reasons:
Because the output I needed was raw HTML, editing in a visual tool seemed unnecessary.
I was prompted for all the necessary information required right away.
The models adhered to their workplans and were able to make all the adjustments I asked for based on previous outputs. For example, if I asked for different colors of fonts or I wanted the signature to support dark mode, they all were able to make these adjustments with no issues and provide updated HTML.
Hubspot’s wizard-style flow had issues uploading my corporate logo; Gmail’s signature editor also had this issue. With the LLM chats, I needed only copy/paste the logo into the chat and they automatically handled everything from there.
Tools like Canva and Hubspot use their own icons, so the HTML includes links back to their websites to retrieve them—meaning anytime your signature is rendered, they’ll be notified. The models used icons directly from the social sites.
This last point deserves some more attention, because it’s a privacy and security issue that could go easily unnoticed unless you have some expertise in this area. First, industry practice is to use URLs for all images and icons; the models understood this and provided warnings against providing embedded images, as DeepSeek does here:
The models automatically used URL-based images but sourced them from our own site or from well-known public sharing sites that you can choose from. For example, are some options that ChatGPT 4o provided:
I’m guessing that’s why the Hubspot service is free, because the site never warned me that every time someone gets an email from me, Hubspot will be called to load up iconography and logos. I was also going for a more professional look, so I wanted to source everything either in-house or from the social media sites the link represents.
Making edits with prompts works great
I spent a decent amount of time with each model working through the design and implementation, asking for help, and adjusting the output. Every time I asked for changes, the models made edits to its previous output with aplomb. When I entered HTML that I had edited in the meantime, they fixed any issues with the new input, as well.
Where experience was necessary
Given the choice of Apple Mail, problems quickly emerged, and HTML expertise was the key to handling those problems smoothly.
Problem 1: Selecting the proper approach
There are many coping strategies for users of Apple Mail, which leads to fairly complex decision tree when creating signatures. People have documented various approaches on the Internet and the models seemed aware of all of them. But the models hadn’t internalized these methods in a way that would allow them to suggest a specific approach given specific requirements. Mostly, the models parroted discreet approaches from the sites that they trained on but couldn’t—or simply didn’t—help navigate important choices along the journey.
The two ways of creating HTML signatures in Mail are:
Paste “properly” formatted HTML into the Apple Mail UI
Edit the hidden files in the ~/Library/Mail/V*/MailData/Signatures/ folder
Either method works, but users have to consider numerous issues before attempting either one. For example, ChatGPT 4o simply led me down the primrose path of editing a *.mailsignature file. This method works great when it’s your first signature. But I already had half a dozen signatures in the hidden directory and I had a tough time getting this to work. I lost several of my other signatures and Mail ended up behaving erratically. Signatures I thought I had deleted suddenly reappeared; one of my other accounts was shockingly assigned my new signature and my old signature was irrecoverably lost; Mail doesn’t name the files the way the models describe. While Hubspot recommends locking HTML files so Mail won’t overwrite them, none of the models I used recommended such a step.
When I asked ChatGPT 4o about this I got this sort of “Whoops! Did I forget to mention that?” response. Here was my prompt after things had already gone wrong:
I have 5 email addresses, each with their own signatures. are there any risks in editing the .mailsignature files?
And here’s part of what came back:
Great question — and it’s smart to ask before diving into .mailsignature
edits. Here’s a quick overview of the risks, best practices, and precautions when editing .mailsignature
files in Apple Mail:
As it turns out, creating these sig files is like editing bits of memory, the Windows registry, or config files—fun and geeky, but also risky. Mail uses a combination of files to track signatures, so editing just the signature file creates problems when you have more than one signature. Also, Mail now uses opaque names for these files, but all the models claim that the file is named whatever you named it in the UI (and yes, this journey starts by creating a default signature in the UI that you then edit in the file system—that’s what warning 3 is talking about with “accidentally”). Also, models suggest using Apple Text Edit, which works well for viewing and copying the information in the file but it doesn’t support editing HTML. An HTML editor is a critical tool no matter which approach you take, but the models I used made no such suggestion. Apple also includes some header-type information in these files, so it’s not strictly HTML, and you must be careful about that. If you ask the models about these issues, they’ll say something like “that’s a great point!” but they won’t tell that sort of thing in advance.
Problem 2: The method I discovered to work the best was never mentioned by any of the models
As stated earlier, the models parrot the instructions they have been trained on or can find via web search, but they don’t have any real-world training input. This may change as more people rely on desktop agents such as Manus and Anthropic MCP / Computer Use to automate their tasks, but for now the responses are amalgamated regurgitations of information on the internet.
The best solution I found is to use a model to generate an html signature, then copy and paste it into a visual editor like html5-editor.net, make any edits you’d like to see, then copy the resulting HTML from the editor and paste it in the Mail UI. This isn’t just about editing, though: the HTML was more consistently formatted when copied from an editor.
Problem 3: You only get what you’re asking for
As I said earlier, the models don’t warn users about the issues they’ll face depending on which approach they take. This appears to be a function of post training and system prompting that pushes the model towards a friendly helper agent mode. (Note that I didn’t use any of the deep research functionality of any of the models, so I suppose that could have some effect.)
Problem 4: Inaccurate instructions
There were several instances where a model offered step-by-step instructions that were either inaccurate or were simply missing steps. In one case, DeepSeek instructed me to drag and drop the .html file into the Apple signature UI, then stated, “The HTML should render automatically.” What was rendered wasn’t properly formatted either in the UI or during use when creating new emails.
Similarly, Gemini missed an important step to lock the signature file, even thought it references Hubspot’s instructions as its source. So, even though the model was trained on this data, it’s clear that Gemini didn’t make a high-fidelity representation of the required steps.
Problem 5: Creating multiple signatures
After generating a signature for The Focus Group, I decided to ask ChatGPT to create a signature using the same format for my ai4society.online account. This worked fairly well, but this time it struggled to get the logo hosted properly. Other models also had minor issues when using a previously created signature to generate other signatures based on the original. This all illustrated these models generate inconsistent outputs.
Summary
Today’s frontier models are very helpful in orchestrating multi-step tasks. They demonstrate broad awareness of the problem space for creating email signatures that work across a combination of clients and email services. They’re also adept at providing the HTML needed for email signatures. But, the models weren’t forthcoming with details necessary for strategic planning and showed no knowledge of real-world practice.