Google has started making its Veo and Imagen 3 generative AI models available to private users. The company’s Vertex AI Google Cloud package customers can start utilizing Veo to create videos from text prompts and imagery as early as today. Next week, the same individuals will be able to access Google’s newest text-to-image foundation, Imagen 3.
Google says that Veo’s launch makes it the first hyperscale cloud provider to deliver an image-to-video paradigm. Only a few artists, scholars, and researchers may now access OpenAI’s Sora model, but that could soon change as the business teases a 12-day product demo period beginning on December 5.
Google claims that Veo’s model can operate “beyond a minute” and produces 1080p video “that’s consistent and coherent.” Additionally, the tool can function with both images and text instructions. In the latter instance, a video can begin with either artificial intelligence (AI)-generated or human-made images.
It’s clear from the sample video Google posted that Veo, like any AI models, has trouble understanding cause and consequence. For instance, when marshmallows are exposed to the heat of a campfire flame in the video, they do not turn yellow or char. Examining the hands in the concert video closely reveals that artifacting is another problem.