Really quick blog post today – a continuation of a shower thought (try not to picture that, btw). Building conversational experience is, in the worlds of the uber-talented Alexa expert Andy May (@andyjohnmay), “Easy, Hard”. I really like how profound this phrase is, and how many different meanings it has for CUIs.
I always try to explain to non-believers that voice and chatbots look super-easy to use (you talk, it responds, you talk some more, something gets done. Easy peasy.) but, like all good product design, that simplicity belies a metric shit-ton of work that’s gone on beforehand to hone, refine and hand-whittle that beast into something so singular.
But, like all the best expert’s wisdom, I keep seeing Andy’s words applying to so many more situations. Like, for example, CUI development. I’ve built several Alexa skills, Google Assistant Actions and Facebook chatbots myself now – I’m not a dev, but as the Director of a conversational app studio, I want to make sure I have a firm understanding of every aspect of the business, including having a working understanding of the guts of what we’re selling.
My relative naivety has led me to realise that Andy’s “Easy, Hard.” idea applies to building conversational apps too.
It seems fairly straightforward to knock up a quick skill or chatbot. Dialogflow is lovely – get your head around the concepts of intents, slot values, sample utterances and contexts and you’ve pretty much nailed it. Press the button and ship to Assistant. Chatfuel is beautiful – one of the best WYSIWYG interfaces I’ve ever used, super simple to create a really engaging Facebook chatbot and, again, click the button to deploy to your Facebook Page.
But, and here’s where the ‘Hard’ bit comes in, as soon as you want to do anything dynamic (responding to the weather with different logic, allowing a user to log in and accessing their account history, storing data for next time, etc…), suddenly these tools become too limited.
You can’t specify application logic (beyond contexts) in Dialogflow, which means that you can’t dynamically vary the UX or adapt the copy spoken based on some state or calculation. Those wonderful WYSIWYG tools for designing the Google Assistant widgets have to be entirely replaced, as you plug your intent into a fulfillment (some cloud-based code). It’s all or nothing: either you use the tooling (and build your widgets visually), or your code fulfillment (and you build your responses manually with code).
There seems to be this hard drop off – suddenly, adding what seems like relatively basic features mean you need to almost wholesale up-sticks and start whittle everything by hand. Suddenly you’re in the world of Node, Serverless, DynamoDB and Lambda function. Proper engineering!
When I was designing conversational experiences, I got used to checking code out of GitHub, editing strings in VSCode and checking back in. I’d design approximations of the Google Assistant widgets in Sketch symbol libraries and design flows the old-fashion way – as static screen flows. I’d then have to hand these design files over to the engineering team. Sure, we managed the intents and sample utterances in Dialogflow, but the business logic and response widgets would then all be created in code.This can’t be the best workflow. There must be a better way.
I’d be really interested to hear from the community on this: have you found a better way of bridging this gap? How do your designers and engineers work together on conversational experiences? Please do let me know!