Google Research’s Lumiere :Multi-Modal Transformers Design using DALL-E3

Google Research’s Lumiere – Space-Time Text-to-Video Diffusion Model

Introduction

  • Google Research introduces Lumiere, a text-to-video diffusion model designed for synthesizing videos with realistic, diverse, and coherent motion.
  • Lumiere addresses the challenge of video synthesis by generating the entire temporal duration of the video at once through a single pass in the model.

Technical Details

  • Space-Time U-Net Architecture: Lumiere uses a Space-Time U-Net architecture that generates full-frame-rate, low-resolution videos by processing them in multiple space-time scales. This architecture deploys both spatial and temporal down- and up-sampling.
  • Text-to-Video Generation: The model can convert text prompts into videos, demonstrating a wide range of content creation tasks and video editing applications.
  • Image-to-Video Conversion: Lumiere transforms input images into videos based on provided prompts.
  • Stylized Generation: The model can generate videos in a target style using a single reference image and fine-tuned text-to-image model weights.

Applications

  • Video Stylization: Lumiere allows for consistent video editing using text-based image editing methods.
  • Cinemagraphs: The model can animate the content of an image within a specific user-provided region.
  • Video Inpainting: Lumiere supports video inpainting, enabling users to modify specific aspects of a video.

Societal Impact

  • Creative Flexibility for Novice Users: Lumiere is designed to enable novice users to generate visual content creatively and flexibly.
  • Risk of Misuse: The technology carries a risk of misuse for creating fake or harmful content, highlighting the need for tools to detect biases and malicious use cases.

Conclusion

  • Lumiere represents a significant advancement in video generation technology, offering a novel approach to creating high-quality, coherent videos from text and images.
  • Its innovative architecture and diverse applications highlight the potential for creative and flexible content generation, while also acknowledging the need for responsible use and bias detection.

Other AI News

  • Why LLMs are Vulnerable to the ‘Butterfly Effect

Researchers from the University of Southern California Information Sciences Institute have discovered that large language models (LLMs) like ChatGPT are susceptible to the ‘butterfly effect,’ where minor changes in prompts can significantly alter the model’s output. This phenomenon was observed in a study where even small tweaks, such as adding a space or changing the format of a prompt, led to different responses from the AI. The study, sponsored by DARPA, applied various prompting methods to ChatGPT, including minor variations, jailbreak techniques, and ‘tipping’ the model. The results showed that formatting changes could cause a minimum 10% prediction change, and certain jailbreaks resulted in a significant performance drop, with some yielding invalid responses in about 90% of predictions.

The research highlights the inherent instability in LLMs and the need for more robust models that can provide consistent answers despite minor prompt changes. This sensitivity to prompt variations raises questions about the reliability of LLMs in practical applications and underscores the importance of understanding and anticipating how these models respond to different inputs. As LLMs become more integrated into systems at scale, addressing these vulnerabilities becomes crucial for ensuring their effective and reliable use.

  • Typeface Launches Multimodal AI Content Hub for Enterprises, Accelerating the Generative AI Content Race

Typeface, a San Francisco-based startup, has publicly launched its multimodal AI content hub for enterprises, marking a significant step in the generative AI content creation race. This hub offers a range of text, image, audio, and video capabilities, allowing organizations to leverage AI models from OpenAI, Microsoft, Google Cloud, and other open-source platforms. Typeface, which emerged from stealth a year ago and has raised $165 million in funding, recently announced integrations with Microsoft Dynamics 365 Customer Insights and Salesforce’s marketing cloud.

Abhay Parasnis, founder and CEO of Typeface, highlights the company’s unique approach, focusing on a multimodal hub that integrates various content modalities into one solution. Unlike its competitors that concentrate on a single modality, Typeface offers a comprehensive solution for marketers and salespeople. Additionally, Typeface differentiates itself by training on each customer’s data, content, and customer behavior, ensuring safety, security, and governance. As enterprises shift from experimenting with generative AI to seeking ROI, Typeface’s multimodal hub stands out for its efficiency, top-line revenue growth potential, and commitment to customer-specific training and data security.

  • Hugging Face and Google Partner to Boost Open AI Development with Cloud Integration

Hugging Face, a prominent AI platform, has formed a strategic collaboration with Google to accelerate the development of open generative AI applications. This partnership enables developers using open-source models from Hugging Face to train and serve them with Google Cloud, leveraging Google’s AI-focused infrastructure and tools, including Vertex AI, TPUs, GPUs, and CPUs. Hugging Face, often referred to as the GitHub for AI, hosts over 500,000 AI models and 250,000 datasets, serving more than 50,000 organizations. With this collaboration, Hugging Face users active on Google Cloud can now train, tune, and serve their models with Vertex AI, an end-to-end MLOps platform, directly from the Hugging Face platform.

The partnership also includes options for training and deploying models within the Google Kubernetes Engine (GKE), offering developers a “do it yourself” infrastructure to scale models using Hugging Face-specific deep learning containers on GKE. This collaboration will make it easier for developers to deploy models for production on Google Cloud with inference endpoints and accelerate applications with TPU on Hugging Face spaces. The new experiences, including Vertex AI and GKE deployment options, are expected to be available to Hugging Face Hub users in the first half of 2024.

  • FTC Launches Probe into Amazon, Alphabet, Microsoft, OpenAI, and Anthropic Over Generative AI Deals

The Federal Trade Commission (FTC) has initiated inquiries into five leading creators of generative AI technology, including Amazon, Microsoft, OpenAI, Anthropic, and Alphabet. These investigations, part of the FTC’s first significant competitive practice actions around generative AI, involve issuing 6(b) orders to these companies to provide information about their recent multi-billion-dollar investments and partnerships. The FTC aims to scrutinize these relationships and actions to better understand their impact on the competitive landscape. The companies have 45 days to respond to the orders.

This move by the FTC follows concerns raised last June about competition in the generative AI sector. FTC chair Lina M. Khan emphasized the importance of guarding against tactics that could undermine healthy competition and innovation in AI. The inquiry will focus on several key aspects, including the strategic rationale for the investments/partnerships, their implications, competitive impact analysis, and information on government entities’ involvement. This study is expected to shed light on whether investments and partnerships by dominant companies risk distorting innovation and undermining fair competition.

  • Intel Reports Q4 Revenue Increase to $15.4B but Forecasts Weaker Q1 Expectations

Intel reported a 10% increase in its fourth-quarter revenue, reaching $15.4 billion, but projected weaker expectations for the first quarter of 2024. The overall revenue for 2023 was $54.2 billion, a 14% decrease from the previous year. The fourth-quarter earnings per share (EPS) were 63 cents, with non-GAAP EPS at 54 cents, while the full-year EPS was 40 cents, with non-GAAP EPS attributable to Intel at $1.05. For the first quarter of 2024, Intel forecasts revenue between $12.2 billion and $13.2 billion, expecting EPS attributable to Intel of 25 cents, with non-GAAP EPS of 13 cents, lower than analysts’ expectations of 34 cents a share. This forecast reflects specific challenges in divisions like Mobileye, as noted by CEO Pat Gelsinger, who views these as temporary issues.

Gelsinger stated that 2023 was a year of significant progress and transformation for Intel, with the company meeting its $3 billion cost savings target and exiting five businesses. Intel’s stock value decreased in after-hours trading following the announcement. The company’s focus remains on achieving process and product leadership, expanding its external foundry business, and executing its mission to bring AI everywhere. CFO David Zinsner emphasized the company’s operational efficiencies and commitment to unlocking further efficiencies in 2024 and beyond.

  • Legal AI Firm Spellbook Raises $20M in Series A Funding Led by Inovia Capital

Spellbook, a legal software company specializing in contract management, has successfully secured $20 million in a Series A funding round, with Inovia Capital, a Montreal-based venture capital firm, leading the investment effort. Other notable participants in this funding round include The Legaltech Fund, Bling Capital, and Thomson Reuters Ventures, operated by Reuters’ parent company, Thomson Reuters. Spellbook’s primary focus is to provide an AI-driven contract drafting and review tool, leveraging large language models like OpenAI’s GPT-4. Designed for corporate and commercial lawyers in both law firms and companies, the tool offers valuable contract language suggestions and negotiation points.

The legal sector is witnessing a growing number of startups developing AI-driven tools, attracting significant capital investments as it explores ways to integrate and benefit from generative AI. The market for legal AI is evolving rapidly, drawing the attention of new investors and creating excitement around the potential applications of this technology. Spellbook’s CEO and co-founder, Scott Stevenson, highlighted the company’s customer base, which includes small and midsize law firms, solo lawyers, and larger firms, along with in-house legal teams. Stevenson noted a shift in investor interest in legal technology, particularly since the emergence of generative AI in late 2022. Other companies in this space, like Norm AI and Robin AI, have also recently attracted substantial investments, underscoring the industry’s growing potential and interest from investors.

  • Anthropic Confirms Data Leak Involving Non-Sensitive Customer Information

Anthropic, the AI startup behind the Claude family of large language models, confirmed it experienced a data leak due to a third-party contractor’s error. On January 22nd, the contractor inadvertently sent a file containing non-sensitive customer information, including names and open credit balances as of the end of 2023, to an external party. Anthropic emphasized that this incident, caused by human error, did not involve a breach of its systems or sensitive personal data like banking or payment information. The company has notified affected customers and provided guidance on the matter.

This leak comes amid heightened scrutiny of Anthropic’s strategic partnerships with Amazon and Google by the Federal Trade Commission (FTC). The FTC is investigating these relationships, along with those of rival OpenAI with Microsoft, to understand their impact on market competition. Anthropic’s spokesperson clarified that the data leak is unrelated to the FTC probe. The incident highlights the risks associated with third-party collaborations in handling sensitive data, especially in the context of the increasing use of large language models by enterprises.

  • Explicit AI Deepfakes of Taylor Swift Spark Outrage and Calls for Regulatory Action

AI-generated deepfake images and videos of Taylor Swift, depicting her in explicit sexual activities, have recently circulated on social media, sparking outrage among her fans and prompting calls for regulation from U.S. lawmakers. These deepfakes, created using generative AI tools, have led to the trending hashtag #ProtectTaylorSwift and widespread condemnation. The images were reportedly made using Microsoft’s AI tools, specifically Microsoft’s Designer powered by OpenAI’s DALL-E 3 image model, which typically prohibits the creation of sexually explicit content. However, the open-source Stable Diffusion image generation model by Stability AI can be used to create such imagery, highlighting the challenges in regulating AI-generated content.

This incident has intensified concerns over the use of generative AI tools and their potential to create nonconsensual explicit imagery of real people. U.S. Congressman Tom Kean Jr. has urged Congress to pass legislation to regulate AI, proposing bills like the AI Labeling Act and the Preventing Deepfakes of Intimate Images Act. These bills aim to add clear notices to AI-generated content and allow victims of nonconsensual deepfakes to sue for damages. The spread of these deepfakes underscores the need for effective regulation and ethical guidelines in the rapidly evolving field of AI.

  • OpenAI Releases New Embedding Models and Updates to GPT-4 Turbo while reducing pricing on GPT-3.5 Turbo

OpenAI has introduced a new generation of embedding models, text-embedding-3-small and text-embedding-3-large, offering enhanced performance and reduced pricing compared to their predecessor, text-embedding-ada-002. These models, capable of creating embeddings with up to 3072 dimensions, significantly improve the accuracy of machine learning tasks by capturing more semantic information. Alongside these, OpenAI has updated its GPT-4 Turbo and GPT-3.5 Turbo models, featuring improved instruction following, JSON mode, more reproducible outputs, and parallel function calling. A new 16k context version of GPT-3.5 Turbo has also been launched for processing longer inputs and outputs.

In addition to model enhancements, OpenAI has updated its text moderation model to better handle a wider range of languages and domains, and provide explanations for predictions. New API management tools have been introduced, allowing developers to create multiple API keys with different permissions and scopes, and monitor usage and billing on the OpenAI Dashboard. Notably, OpenAI has reduced the price of the GPT-3.5 Turbo model by 25%, making it more accessible for developers to build applications. These updates reflect OpenAI’s commitment to continually improving its models and services, making them more useful and affordable for a broader range of developers and customers.

  • Arcee Launches Secure, Enterprise-Focused Platform for Building Generative AI

Arcee, a new startup co-founded by former Hugging Face engineers Mark McQuade and Brian Benedict, is creating a secure, enterprise-focused platform for building generative AI (GenAI) models. This platform, designed to address the trust deficit in existing generative AI systems, particularly regarding performance and security, allows organizations to build and train GenAI models within a secure compute environment. Launched in February and based in Miami, Arcee has already attracted $5.5 million in venture funding from investors including Long Journey Ventures, Flybridge, Centre Street Partners, Wndrco, 35V, AIN Ventures, and Hugging Face CEO Clément Delangue.

Arcee’s platform is end-to-end, employing an adaptive system for training, deploying, and monitoring GenAI models. It operates in a virtual private cloud, offering superior fine-tuning and security to mitigate privacy risks. This approach allows organizations to ensure data privacy and grants them full ownership of their AI models and technology stack. Arcee’s platform is particularly revolutionary for highly regulated industries such as legal, healthcare, insurance, and financial services. Despite the crowded market for GenAI development platforms, Arcee’s unique approach and focus on security and privacy set it apart, positioning it as a potential leader in AI innovation for enterprises.

  • Ola founder’s AI startup Krutrim becomes unicorn in $50M funding

Krutrim, an AI startup founded by Ola founder Bhavish Aggarwal, has become a unicorn with its latest funding round, raising $50 million and valuing the company at $1 billion. This achievement makes Krutrim the fastest startup in India to reach unicorn status and the first Indian AI startup to do so. The funding round was led by Matrix Partners India, which has also backed Aggarwal’s other ventures, Ola and Ola Electric.

Krutrim is developing a large language model trained on local Indian languages in addition to English. The startup plans to launch a voice-enabled conversational AI assistant that understands and speaks multiple Indian languages. A beta version of its chatbot is expected to be available to consumers next month, followed by the rollout of APIs for developers and enterprises. Krutrim also aims to develop in-house capability for manufacturing AI-optimized chips. Aggarwal’s vision for Krutrim is to build India’s first complete AI computing stack, reflecting his commitment to driving innovation in AI from India for the global market.

  • Elon Musk says AI firm xAI not in funding talks, denies $6B valuation report

Elon Musk, the CEO of Tesla and founder of xAI, addressed recent reports about his artificial intelligence firm’s fundraising efforts. Musk stated that xAI is currently not engaged in discussions with investors to secure funding and emphasized that he has not had any conversations related to capital-raising endeavors. This clarification followed media reports earlier in the day suggesting that xAI was in the process of raising up to $6 billion, potentially valuing the startup at $20 billion. Musk’s statement was made through a post on X, reaffirming that xAI is not actively seeking capital.

The report also indicated that xAI has been exploring fundraising options by engaging with family offices in Hong Kong and targeting sovereign wealth funds in the Middle East. This development reflects the ongoing surge in interest and investment in AI startups, which have been capturing the attention of investors and entrepreneurs. Elon Musk, who co-founded OpenAI in 2015 but later stepped down from its board in 2018, remains committed to advancing AI technologies and has actively contributed to the field. The AI industry continues to thrive, attracting attention and resources despite a subdued startup funding environment in other sectors, largely due to the growing popularity of AI models like ChatGPT.

About The Author

Bogdan Iancu

Bogdan Iancu is a seasoned entrepreneur and strategic leader with over 25 years of experience in diverse industrial and commercial fields. His passion for AI, Machine Learning, and Generative AI is underpinned by a deep understanding of advanced calculus, enabling him to leverage these technologies to drive innovation and growth. As a Non-Executive Director, Bogdan brings a wealth of experience and a unique perspective to the boardroom, contributing to robust strategic decisions. With a proven track record of assisting clients worldwide, Bogdan is committed to harnessing the power of AI to transform businesses and create sustainable growth in the digital age.