2 Comments

I completely agree with the premise. Translation is an enormous and under-appreciated opportunity.

Re: fundamental model layer, though...

Traditional and fine-tuned LLMs *will* be excellent at this, eventually. But real time translation during conversation may be a little like AR/VR: we need it to be fast enough to be fluid, and the challenge of making it that fast could really high.

In that case, general LLMs and fine-tuned ones might not be good enough for a long time. Translation differs because...

1) You might need shallow inference from a wide context window with focused depth in a narrow one. (It's critical to know previously stated names, and where you are. But not every detail about what is going on.)

2) The audio in language A is obviously an incredibly important input into any model to predict the word in language B.

In particular, (1) and (2) suggest training a different neural network. Which isn't a huge deal, but... it is incredibly expensive and needs effort. And once you've designed this, you might have a different amount of data to manage (much less?!) which leads to a slightly different ideal hardware solution. In theory, someone like Nuance *should* be eating this problem up, but I would bet against them.

Expand full comment

I'm not from the world of computer science but my interest in AI was sparked by watching 'The A.I. Dilemma' presentation from Tristan Harris and Aza Raskin. In describing the advent of transformers, the 2017 gamechanger for AI, they remarked "The sort of insight was that you can start to treat absolutely everything as language, but it turns out you don't just have to do that with text. This works for almost anything."

This is what made me realise the potential of AI and why it implicates everybody. Lawyers communicating and collaborating with musicians and mathematicians. It's something people should be exploring and reckoning with as soon as possible.

Expand full comment