More Learnings from Building Agentic Workflows
Part II: Straddling unstructured and structured data flows
In a previous post, I discussed essential work patterns for creating agentic workflows, based on my experimentation. In this post, I’ll delve into some of the more challenging issues related to data formatting in these flows. Before doing so, I’ll reiterate an axiom from the previous post:
3. Try, try, try …
The only way to know how a model will behave is to try it … A LOT. This means you must use a significant amount of data, tokens, and output files, but it's a necessary sunk cost to investigate how the model will react to your prompts and inputs.
Workflows are complex enough when you use structured data throughout. But models are very unpredictable when consuming unstructured data as input and producing data of any type as output. So, again, the key to creating effective workflows is to run tests—even the same tests—multiple times.
Below are some important considerations for handling data in agentic workflows.
The Nitty Gritty Details of Agentic Workflows
The wide variety of models and tools involved in agentic workflows can make them unwieldy and difficult to maintain. Here are some pointers to improve your productivity when creating such workflows.
1. LLMs aren't consistent about formatting
Every model has its unique approach to standard data formats, and you must test them to determine what they're likely to do. In my testing, ChatGPT 4o kept adding sets of leading and trailing commas in CSV outputs, which is non-standard. Chat GPT 3.5 Turbo had issues formatting JSON. You can prompt your way around these things if you're aware of them.
2. GPT 4o can't format JSON properly
This is frustrating, but true. Take it from Gemini:
When using the OpenAI API with ChatGPT and requesting JSON output, the response sometimes starts with
```json
which is an artifact of the JSON mode training and can be stripped using a regex.
Seriously? So, despite all the hype over how great LLMs are at coding, artifacts from the models’ training nevertheless require you to fall back to using regular expressions to fix the output. So watch out for unwanted artifacts.
3. The CSV format is more dense
In my test scenario, I needed to use JSON because the Google Sheets API requires it. But I encountered problems with even small datasets that would’ve required batching or some other mechanism (in Make, at least) to get all the data into a single API call. In contrast, the same amount of data could be easily handled in a single message in CSV format.
For large implementations, this will be less of an issue, as there are numerous ways to utilize databases and Python code to circumvent data limitations. But for prototyping and simple no-code implementations, it’s certainly worth noting.
4. Tell the LLM to cut the chit-chat
The web-based chatbot experience usually includes friendly banter, particularly if you’re using Grok. But in workflow mode, you want none of such nonsense. Instruct the LLM in the System Prompt to send only the data in the requested format. Then reiterate with an example:
Don't include any leading sentence, such as "Here is a list of ..."
5. Mind the context window size
Each model has an upper limit to the amount of data it can pass. In my test flow, when I increased the number of items evaluated from 15 to 25, the next step in the workflow generated a significantly larger amount of research, thereby increasing the total data passing through the workflow. Often, the model or the workflow service stops generating or forwarding information and passes on a malformed, unfinished document to the next step.
When dealing with large amounts of data moving through the workflow, it’s better to consider queuing and stacking data. For my test use case, I utilized Make’s Array Aggregator module to gather all information for vendors under evaluation before proceeding to the next step in the workflow.
6. Tool limitations
Make.com has limitations in terms of how many characters you can put into a single "Text Content" box, so you'll need to use the System Prompt plus at least one or more Text Content boxes to get all the prompt text in there.
7. Downstream dependencies
Consider all of the output types you’ll need throughout the workflow before defining the input formats and initial output. Ideally, the workflow’s primary paths preserve the original formatting (Markdown, JSON, CSV) and convert to another format only in an output fork, because doing so will improve the manageability of the system.
For tabular data, it’s essential to determine which data elements are necessary as column headers. This is not only to ensure completeness, but also to reduce the need for pivoting tabular formats throughout the workflow.
8. Filters, loops, and error handling
When converting formats, make sure to include a filter that validates the output for any malformed JSON, missing columns or cells, and superfluous text or data that may cause issues in subsequent steps. If the filter detects errors, a simple retry can often resolve the problem, as models usually produce better results on a second or third attempt. It’s essential to have a breakpoint to avoid infinite loops.
Errors can be either ignored or handled, but you’ll need to run a series of validation tests to figure out how to approach specific errors. In my use case, I opted to ignore certain types of errors because this enabled other essential branches of the workflow to continue processing.
9. Model & Version
Each version of each model will produce different outputs. Consequently, it’s better to specify a separate version of the workflow for each specific model and maintain it accordingly. If you want to try a different model or version, clone the workflow and start fresh from there.
10. Cloning vs. version control
If making significant changes, it's better to clone a workflow (what Make calls a “Scenario”) than to rely on version control. For example, switching from a CSV-centric approach to a JSON-based workflow, you’re better off cloning a Scenario and making any JSON-specific adjustments to the new scenario. This also enables you to revert to the CSV-based flow should problems arise in the new JSON flow.
11. No-code tools?
No-code environments like Make are ideal for prototyping, refining prompts, and enhancing personal productivity. However, you’ll likely need to use robust Python code for production deployments to ensure that rules, filters, error handling, and formatting are properly applied.
12. Human in the loop
Many workflows shouldn’t be fully automated and therefore require a manual step before proceeding. In Make, this can be achieved by creating multiple scenarios, each of which is started manually after human inspection.
Conclusion
Agentic workflows are much more useful than zero-, one-, or multi-shot prompts in browser-based chatbot interactions. However, they also require a lot more discipline and testing. My final thought is to keep track of test runs and the variations you notice over time, with different models and inputs. You should also write up your learnings to share with others.
Great insights Mike. Perfect example of using the right LLM for the job. I've been running Grok and ChatGPT side-by-side with my N8N in workflow development and GPT is getting it right about 80% of the time where Grok is over 93%.
Thanks for sharing! What's your assessment of Grok?