Microsoft says GPT-4 AI is coming next week with video features

The latest model of ChatGPT will be able to turn text to video  (Alamy / PA)
The latest model of ChatGPT will be able to turn text to video (Alamy / PA)

OpenAI’s ChatGPT chatbot has taken the world by storm, but a more powerful successor could be around the corner.

GPT-4, the next large-language model in the company’s GPT-series after GPT-3.5, which underpins ChatGPT, is apparently coming next week. That’s according to Microsoft Germany’s CTO Andreas Braun, who made the announcement at an event on Thursday (March 9). Previous reports have claimed that GPT-4 could arrive as soon as this spring.

Details of the new AI model has largely been kept under wraps, but Braun claims GPT-4 will boast multimodal capabilities, including the ability to turn text into video. That makes sense, seeing as rivals Meta and Google both revealed text-to-video generators last autumn.

It’s probably no coincidence that Microsoft is holding a special event next week, on Thursday, March 16. The tech giant’s boss, Satya Nadella, will be present at the demonstration, where the company will showcase the “future of AI”, including how it will work in productivity apps like Teams, Word, and Outlook.

So what can GPT-4 actually do? If it’s anything like the tech shown off by Google and Meta, it will be able to create short videos based on rough text descriptions of a scene. It will also boost the resolution of the clips using other AI models to make them look clearer. Still, the systems have their limits: the videos created by them tend to look artificial, with blurred subjects and distorted animation, and they lack sound.

Still, making the jump from current image-generating AI to video is a pretty big advancement in and of itself.

As Meta CEO Mark Zuckerberg previously explained: “It’s much harder to generate video than photos because, beyond correctly generating each pixel, the system also has to predict how they’ll change over time.”

Notably, OpenAI already boasts a tool called Dall-E that can create images from natural language descriptions.

In a nutshell, language models are algorithms that can recognise, summarise, and generate text based on knowledge gained from large datasets, including info scraped from websites such as Wikipedia. These training data components are known as parameters and essentially indicate the skill of the model on a problem, such as generating text.

The largest model in GPT 3.5 has 175 billion parameters. By comparison, Meta recently released its own language model to researchers, which has a maximum of 65 billion parameters.

It’s unclear what sort of parameters GPT-4 will have. OpenAI CEO Sam Altman previously shot down wild rumours that suggested GPT-4 would substantially increase the parameter count from GPT-3 from 175 billion to 100 trillion, calling them “complete bulls**t”.

In January, OpenAI CEO Sam Altman reportedly demoed GPT-4 to members of Congress in order to ease their concerns over the dangers of AI, including the tech’s ability to mimic human writers and create convincing images of fake events, known as deepfakes. Altman reportedly showed how the new system will have greater security controls than previous models.

OpenAI’s GPT-3.5 language model is already powering a range of chatbots, including its proprietary ChatGPT tool, along with bots from the likes of Microsoft (which is an OpenAI investor), Snapchat, Discord, and Slack.

At the Microsoft Germany event, the company also rehashed some of the features it had already revealed, including the tech’s ability to answer and summarise calls for sales reps.