Stability AI:Multi-Modal Transformers Design using DALL-E3

Stability AI Introduces SDXL Turbo Text to Image Model with Adversarial Diffusion Distillation

Introduction:

Stability AI has unveiled SDXL Turbo, a groundbreaking text-to-image model employing Adversarial Diffusion Distillation (ADD). This innovative model synthesizes image outputs in a single step, facilitating real-time text-to-image generation while maintaining high sampling fidelity. SDXL Turbo is a significant step forward in generative AI, though it is not yet intended for commercial use.

Features:

  • Adversarial Diffusion Distillation (ADD): Employs a novel distillation technique for text-to-image models, enabling single-step image outputs.
  • Performance Comparison: Surpasses models like StyleGAN-T++, OpenMUSE, IF-XL, SDXL, and LCM-XL in blind tests with human evaluators.
  • Inference Speed: Capable of generating a 512×512 image in 207 milliseconds on an A100, inclusive of prompt encoding, a single denoising step, and decoding.

Benefits:

  • Superior Image Quality: Avoids common artifacts or blurriness, ensuring high-quality image generation.
  • Reduced Computational Requirements: Operates with significantly lower computational resources compared to multi-step models.
  • Real-Time Generation Capability: Enables efficient, real-time generation of images for various applications.

Other Technical Details:

  • Clipdrop Beta Demonstration: SDXL Turbo’s capabilities can be explored through a beta demonstration on Stability AI’s image editing platform, Clipdrop, which is compatible with most browsers.
  • Availability on Hugging Face: The model and weights of SDXL Turbo are accessible on Hugging Face, allowing broader experimentation and application by AI researchers and enthusiasts.

Conclusion:

SDXL Turbo by Stability AI marks a notable advancement in the realm of AI-driven image synthesis. With its innovative ADD technique and impressive performance in speed and image quality, SDXL Turbo is poised to influence the future of text-to-image generation technology. Its availability for testing on Clipdrop and Hugging Face opens up new avenues for exploration and application in the field of generative AI.

Perplexity AI: Multi-Modal Transformers Design using DALL-E3

Perplexity.ai Introduces PPLX Online Family of LLMs

Introduction:

Perplexity AI has introduced two new Large Language Models (LLMs), pplx-7b-online and pplx-70b-online, marking a significant advancement in AI-driven search and information retrieval. These models are designed to deliver helpful, factual, and up-to-date responses, addressing the limitations of freshness and accuracy in existing LLMs.

Features:

  • Model Variants: pplx-7b-online and pplx-70b-online, built on open-source models mistral-7b and llama2-70b.
  • In-House Search Technology: Utilizes a sophisticated search, indexing, and crawling infrastructure for augmenting LLMs with relevant, current information.
  • Fine-Tuning: Regular fine-tuning with diverse, high-quality training sets for improved performance in helpfulness, factuality, and freshness.

Benefits:

  • Up-to-Date Information: Ability to answer time-sensitive queries with the latest information.
  • High Accuracy: Reduced instances of hallucinations and increased factuality in responses.
  • Versatility: Applicable for a wide range of queries, from general knowledge to specific, niche topics.

Other Technical Details:

  • Human Evaluation: Models evaluated on helpfulness, factuality, and freshness by human contractors, showing superior performance compared to competitors like OpenAI’s GPT-3.5 and Meta’s Llama 2.
  • Elo Score Analysis: Utilized Bootstrap Elo methodology for evaluating model performance, with pplx models outperforming competitors in various criteria.
  • Public Availability: Models are now accessible via pplx-api, transitioning from beta to general public release with a usage-based pricing structure.

Conclusion:

The introduction of pplx-7b-online and pplx-70b-online by Perplexity AI represents a notable leap in the field of AI-driven search and information retrieval. These models, with their unique ability to provide accurate and current information, are poised to redefine how users interact with and utilize AI for obtaining information. The move to public availability and a new pricing structure further underscores the potential impact of these models in various applications and industries.

Other AI News

  • Pika Labs Raises 55mto Launch New AI Powered Text to Video Platform to Compete with Runway

Pika Labs, a 6-months old AI video startup, has announced a successful $35 million Series A funding round led by Lightspeed Venture Partners. This latest funding round, combined with previous investments, brings the total capital raised by the company to $55 million. Concurrently, Pika Labs unveiled Pika 1.0, an advanced web platform that expands upon its beta version. This platform allows users to generate and edit videos in a variety of styles, such as 3D animation, anime, and cinematic, simply by using text prompts.

The release of Pika 1.0 signifies a notable advancement in the AI-driven video generation field, placing Pika Labs in direct competition with established entities like Runway and Stability AI. Adobe, a major player in the digital creative space, is also exploring similar technologies. Pika’s platform has already garnered significant user engagement, with over half a million users. The platform is currently open for sign-ups, and access will be progressively extended to users, aiming to make video creation and editing more accessible to a broader audience.

Pika 1.0 distinguishes itself by offering a user-friendly web interface, moving away from its initial reliance on the Discord app. This interface facilitates the creation of higher-quality videos through intuitive text prompts. The platform’s capabilities are not limited to text-to-video conversions; it also supports image-to-video and video-to-video transformations, enabling users to animate static images, alter cinematic videos into different styles, and introduce new elements into existing videos. Pika Labs plans to make this new version available both on Discord and through a dedicated website, compatible with mobile and desktop devices. However, the full suite of video AI features will be introduced in stages to ensure system stability.

  • Micro1, a Los Angeles-based startup, is revolutionizing the process of building engineering teams in the AI era

Micro1 recently raised $3.3 million in an oversubscribed pre-seed funding round, bringing its post-money valuation to $30 million. This funding is earmarked for the expansion of Micro1’s AI-powered recruiting platform, which aims to transform the way technical talent is recruited and redefine the future of software development. Ali Ansari, the founder and CEO of Micro1, envisions creating an AI system that enables companies to prototype software within five minutes, thereby redefining software development.

Central to Micro1’s offerings is the GPT Vetting tool, an AI-powered system designed for efficient and accurate screening of technical talent at scale. This system combines the capabilities of GPT-4 and Whisper technologies to generate tailored questions for candidates, assessing their skills through a process that includes a live coding exercise. The platform boasts a pool of about 500 pre-vetted candidates, promising to match clients with suitable candidates within 48 hours and facilitate hiring in as little as two weeks. Micro1’s approach, which involves multiple rounds of manual interviews in addition to AI vetting, claims to identify the top 1% of talent, offering a level of speed and accuracy that traditional recruitment processes cannot match.

Micro1’s innovative approach to technical recruiting could be a game-changer in an industry where speed, efficiency, and access to top talent are crucial. The company’s vision of rapid software prototyping could significantly reduce time-to-market, offering a substantial competitive advantage. Furthermore, the application of AI in talent screening promises a more efficient and objective recruitment process, addressing the hiring challenges faced by the tech industry. Micro1’s strategy not only automates tasks but also enhances human decision-making capabilities, positioning it to potentially lead a transformation in the digital and software development landscape.

  • At the AWS re:Invent 2023 Amazon announces several significant advancements in AI and cloud computing

Key highlights include the integration of generative AI into Amazon Transcribe and the deployment of Nvidia’s powerful GH200 AI chips. AWS CEO Adam Selipsky revealed the second generation of the company’s Trainium chip, Trainium 2, which can train models four times faster than its predecessor and has triple the memory.

AWS also introduced Amazon Q, a chat tool for businesses to query specific company data, acting as an AI assistant. This tool allows employees to easily access information without sifting through numerous documents. Additionally, AWS and Nvidia have expanded their partnership, with AWS set to be the first cloud provider to deploy Nvidia’s GH200 chips. Nvidia’s AI “factory” DGX Cloud will also be available on AWS.

Other notable announcements include AWS experimenting with a new chip to address quantum computing errors and the launch of a new virtual desktop link, akin to a Fire TV Cube for businesses. This device, designed for high employee turnover environments like call centers, offloads computing power to the cloud and provides remote access to virtual desktops. AWS is offering these thin clients at a competitive price, leveraging the hardware used in the Amazon Fire TV Cube.

  • Amazon Web Services (AWS) upgrades its Amazon Transcribe product by incorporating generative AI

Amazon Web Services (AWS) has significantly upgraded its Amazon Transcribe product by incorporating generative AI, enabling the transcription of speech in 100 languages. This enhancement, announced at the AWS re:Invent event, represents a substantial leap from the previous support of 39 languages. Amazon Transcribe, utilized by AWS customers for adding speech-to-text capabilities to applications, has been trained on millions of hours of diverse, unlabeled audio data. The platform uses self-supervised algorithms to accurately recognize and understand human speech across various languages and accents, ensuring even lesser-used languages are transcribed with high accuracy.

In addition to language expansion, Amazon Transcribe has improved its accuracy by 20 to 50 percent across many languages. The platform now includes features like automatic punctuation, custom vocabulary, and automatic language identification, capable of handling speech in audio and video formats, even in noisy environments. AWS has also applied these generative AI advancements to its Amazon Transcribe Call Analytics platform, which now more efficiently summarizes interactions in contact centers. Furthermore, AWS announced enhancements to its Amazon Personalize product, introducing a Content Generation feature that writes titles or email subject lines, aiding in creating more cohesive recommendation lists for customers.

  • Anthropic launches Claude 2.1 with longer context and reduces prices to better compete with OpenAI’s GPT

Anthropic, a leading AI model lab, has strategically reduced the pricing of its conversational AI model, Claude 2.1, in response to increasing competition from large AI firms and the growing presence of open-source alternatives in the enterprise AI market. This move is seen as a necessary step to retain and attract enterprise customers who are increasingly looking for cost-effective and efficient AI solutions. The decision to lower Claude’s per-token rates reflects Anthropic’s commitment to maintaining its position in a rapidly evolving market, where affordability and value are becoming key differentiators.

The rise of open-source AI models presents a significant challenge to closed-source large language model firms like OpenAI and Anthropic. Open-source models offer greater customization and potentially lower costs, as they allow companies to optimize their AI infrastructure for specific needs. This shift towards open-source solutions is compelling larger companies to consider these alternatives seriously, as they can significantly reduce costs while maintaining control over their AI stack. As a result, closed-source vendors like Anthropic are compelled to adapt their pricing strategies to remain competitive in a market that is increasingly sensitive to cost and technological advancements.

  • Symphony is teaming up with Google to upgrade its voice analytics for banks and investment firms

Symphony, a markets infrastructure and technology firm, is collaborating with Google to enhance its voice analytics services for banks and investment firms. This partnership comes as financial regulators, particularly the U.S. Securities and Exchange Commission, intensify their scrutiny on communications compliance, having already imposed over $2 billion in fines for lapses in this area. These fines are largely associated with the failure to properly track or record business-related text messages sent over unauthorized platforms during the COVID-19 lockdowns.

To address the growing regulatory focus on voice and video call compliance, Symphony will leverage Google Cloud’s generative artificial intelligence platform, Vertex AI, to improve its Cloud9 voice product. This enhancement will introduce advanced speech-to-text and natural language processing capabilities, enabling more accurate transcription of communications for retention and review. The Cloud9 product, used by trading teams across various asset classes, will now be able to flag suspicious discussions and transcribe conversations more effectively. Beyond compliance, the data mined from these communications can provide additional insights for sales or trading strategies and customer experience monitoring. Symphony CEO Brad Levy aims to launch the enhanced product by Q2, while Phil Moyer, VP of Google Cloud’s Global AI Business, highlights the role of AI in managing the increasing volume of data and maintaining compliance in an increasingly complex regulatory environment.

  • AI-Powered Breakthrough: GNoME Discovers Over 2.2 Million New Crystals, Revolutionizing Materials Science

Researchers at Google DeepMind and Lawrence Berkeley National Laboratory have achieved a significant scientific breakthrough with the development of a new AI system named GNoME. This system has discovered over 2.2 million new materials, potentially revolutionizing the development of technologies such as batteries, solar panels, and computer chips. The research, published in two papers in the scientific journal Nature, details how DeepMind researchers scaled up deep learning techniques to enable GNoME to efficiently explore possible material structures. In just 17 days, GNoME identified 2.2 million potentially stable new inorganic crystal structures, a nearly tenfold increase over the previously known stable inorganic crystals. Over 700 of these have already been experimentally validated.

GNoME employs two methods for discovering stable materials: one that creates similar crystal structures and another that uses a more random approach. The outcomes of both methods are tested, with the results enhancing the GNoME database for future learning. The second paper explains how GNoME’s predictions were tested using autonomous robotic systems at Berkeley Lab. In a span of 17 days of continuous automated experiments, the system successfully synthesized 41 out of 58 of the predicted compounds, achieving an exceptionally high 71% success rate.

The newly discovered materials data has been made publicly available via the Materials Project database, allowing researchers to screen through the structures to identify materials with desired properties for real-world applications. For instance, the researchers identified 52,000 potential new 2D layered materials similar to graphene, 25 times more potential solid lithium-ion conductors than previous studies, and 15 more lithium-manganese oxide compounds that could replace lithium-cobalt oxide in batteries. This breakthrough signifies a new era in materials science, where AI-driven approaches could expedite the creation of new materials for specific applications, potentially leading to faster innovation and cost reduction in product development.

  • Amazon’s AWS introduces a chat tool, Amazon Q, enabling businesses to inquire about company-specific matters

Amazon Web Services (AWS) has launched a new chat tool named Amazon Q, designed to assist businesses in querying specific company-related data. Announced by AWS CEO Adam Selipsky at the AWS re:Invent event, Amazon Q functions as an AI assistant, enabling users to ask questions about their business using their data. This tool allows employees to efficiently access information, such as the company’s latest guidelines for logo usage or understanding a colleague’s code for app maintenance, without the need to sift through numerous documents. Amazon Q is accessible through the AWS Management Console, individual companies’ documentation pages, developer environments like Slack, and other third-party apps. Selipsky emphasized that queries on Amazon Q will not be used to train any foundational models.

Amazon Q operates in conjunction with various models found on Amazon Bedrock, AWS’s repository of AI models, including Meta’s Llama 2 and Anthropic’s Claude 2. Customers can select the model that best suits their needs, connect to the Bedrock API, and use it to learn their data, policies, and workflows before deploying Amazon Q. The tool has been trained on 17 years of AWS knowledge and is tailored to answer AWS-specific queries, suggesting optimal AWS services for projects. Initially, Amazon Q is available exclusively for Amazon Connect users, AWS’s service for contact centers, but plans are in place to extend its availability to other services like Amazon Supply Chain and Amazon QuickSight, with a preview already available for business intelligence.

Each implementation of Amazon Q across AWS services will have a unique appearance. For example, in Amazon Connect, Q is deployed in real-time, listening in on customer calls to provide contact center agents with relevant information without manual searching. Dilip Kumar, Vice President for AWS Applications, highlighted the strategic pairing of the technology with services where AI integration is most beneficial, such as contact centers, supply chain, and business intelligence. Pricing for Amazon Q in Connect starts at $40 per agent per month, with a no-charge trial available until March 1, 2024. Additionally, AWS is introducing guardrails in Bedrock to ensure AI-powered applications adhere to data privacy and responsible AI standards, addressing the needs of industries with stringent regulations.

About The Author

Bogdan Iancu

Bogdan Iancu is a seasoned entrepreneur and strategic leader with over 25 years of experience in diverse industrial and commercial fields. His passion for AI, Machine Learning, and Generative AI is underpinned by a deep understanding of advanced calculus, enabling him to leverage these technologies to drive innovation and growth. As a Non-Executive Director, Bogdan brings a wealth of experience and a unique perspective to the boardroom, contributing to robust strategic decisions. With a proven track record of assisting clients worldwide, Bogdan is committed to harnessing the power of AI to transform businesses and create sustainable growth in the digital age.