Mixtral 8x7B :Multi-Modal Transformers Design using DALL-E3

Mistral introduces Mixtral 8x7B Language Model outperforming OpenAI’s GPT 3.5

Introduction

  • Mistral, a European startup, has released Mixtral 8x7B, an open-source large language model (LLM).
  • The model has garnered attention for its performance, reportedly surpassing OpenAI’s GPT-3.5 and Meta’s Llama 2 family in various AI benchmarks.

Features

  • Mixture of Experts Technique: Mixtral 8x7B employs a technique known as “mixture of experts,” which combines different models each specializing in a specific category of tasks.
  • Open Source and Accessibility: The model is open-source, available under an Apache 2.0 license, and can be run locally on machines without dedicated GPUs, including Apple Mac computers with M2 Ultra CPU.
  • Lack of Safety Guardrails: Notably, Mixtral 8x7B seemingly operates without safety guardrails, which could raise concerns for policymakers and regulators.

Performance Comparison

  • Benchmark Performance: Mixtral 8x7B has shown to equate or outperform GPT-3.5 and Llama 2 models in various AI benchmarking tests.
  • Local Machine Compatibility: Its small footprint allows it to run on non-specialized hardware, broadening its accessibility.

Commercial Usage and Development

  • Licensing: Mixtral 8x7B is available for commercial use under the Apache 2.0 license.
  • Future Developments: Mistral hints at more powerful models in development, including an alpha version of Mistral-medium.

Funding and Valuation

  • Mistral recently closed a $415 million Series A funding round, valuing the company at $2 billion.

Conclusion

  • Mixtral 8x7B represents a significant development in the AI community, particularly in the open-source domain, by offering a high-performance model that rivals leading proprietary models like GPT-3.5.
  • Its lack of safety features and open-source nature may lead to broader discussions and considerations regarding the responsible use and regulation of AI technologies.

Microsoft Research Phi-2: Multi-Modal Transformers Design using DALL-E3

Microsoft Research introduces Phi-2 2.7B “Small” LLM

Introduction

  • Microsoft Research has released Phi-2, a 2.7 billion-parameter language model, as part of its Phi series of small language models (SLMs).
  • Phi-2 demonstrates exceptional reasoning and language understanding capabilities, showcasing state-of-the-art performance among base language models with less than 13 billion parameters.

Features

  • Model Size: Phi-2 is a 2.7 billion-parameter model.
  • Training Data Quality: Emphasis on “textbook-quality” data, including synthetic datasets for common sense reasoning and general knowledge, augmented with high-quality web data.
  • Knowledge Transfer: Utilizes scaled knowledge transfer from Phi-1.5 (1.3 billion parameters) to Phi-2, enhancing performance and accelerating training convergence.
  • Transformer-based Architecture: Phi-2 employs a next-word prediction objective, common in Transformer models.

Performance Comparison

  • Benchmark Achievements:
    • Outperforms models up to 25x larger, including the Llama-2-70B model, in multi-step reasoning tasks like coding and math.
    • Surpasses the performance of Mistral and Llama-2 models at 7B and 13B parameters on various benchmarks.
    • Matches or outperforms Google Gemini Nano 2, despite being smaller in size.
  • Safety Scores: Demonstrates better behavior regarding toxicity and bias compared to existing models, as evidenced in the ToxiGen benchmark.

Training Details

  • Training Duration: Trained for 14 days on 96 A100 GPUs.
  • Data Source: Trained on 1.4 trillion tokens from a mix of Synthetic and Web datasets for NLP and coding.
  • Model Alignment: Phi-2 is a base model without alignment through reinforcement learning from human feedback (RLHF) or instruct fine-tuning.

Evaluation

  • Benchmarks: Evaluated on Big Bench Hard (BBH), commonsense reasoning, language understanding, math, and coding benchmarks.
  • Internal Testing: Also tested on Microsoft’s internal proprietary datasets and tasks, showing consistent outperformance over Mistral-7B and Llama-2 models.

Conclusion

  • Phi-2 represents a significant advancement in the field of small language models, achieving remarkable performance with a relatively smaller parameter count.
  • Its strategic training approach and focus on high-quality data demonstrate the potential of smaller models in achieving capabilities similar to larger models.
  • Phi-2’s availability on Azure AI Studio model catalog encourages further research and development in language models, particularly in areas of mechanistic interpretability and safety improvements.

Other AI News

  • Microsoft has expanded its Azure AI Studio to include open-source LLMs such as LlaMA2

Microsoft has expanded its Azure AI Studio significantly by including Meta Platforms’ open-sourced AI model Llama 2 and OpenAI’s GPT-4 Turbo with Vision as part of its “model-as-a-service” (MaaS) offerings. This move allows customers to use these AI models on-demand over the web with minimal setup, without the need for installation on their own cloud server space. John Montgomery, Microsoft Corporate Vice President of Program Management for its AI Platform, explained that this service operates models as API endpoints, simplifying the process for customers who prefer not to manage infrastructure. The inclusion of Llama 2 in Azure AI Studio, available in public preview, offers various model options like Llama-2-7b and Llama-2-70b, both for text generation and chat completion.

Microsoft’s decision to offer a range of Llama models alongside OpenAI’s models is a strategic move to provide more choices to Azure cloud storage and service customers. This approach gives customers a cost-effective alternative to Microsoft’s partner OpenAI’s GPT-3.5 and 4 models. Llama 2 has gained popularity as a preferred open-source option for many users and enterprises, making it a significant addition to Azure AI Studio. This expansion aligns with Microsoft CEO Satya Nadella’s strategy to diversify Microsoft’s AI investments, following recent developments with OpenAI’s leadership and board decisions.

In addition to Llama 2, Azure AI Studio now includes OpenAI’s GPT-4 Turbo with Vision, enhancing the AI’s capabilities to analyze and describe photos and visual material. This model is already being used by customers like Instacart and WPP. Azure AI Studio also provides tools for fine-tuning all the models offered, further enhancing its appeal to customers. As the AI cloud wars intensify, Microsoft’s Azure AI Studio is poised to add more models, potentially including other open-source models like Mistral or Deci. The recent inclusion of these models in Azure AI Studio reflects Microsoft’s commitment to offering a diverse range of AI tools and services, catering to the evolving needs of businesses and developers in the AI space.

  • Meta introduces advanced AI features for its Ray-Ban smart glasses

Meta is introducing advanced AI features for its Ray-Ban smart glasses, starting with an early access test. These multimodal AI features enable the glasses to provide information about objects and environments using their camera and microphones. Mark Zuckerberg demonstrated this capability in an Instagram reel, where he asked the glasses to suggest pants matching a shirt, and the AI assistant responded with appropriate suggestions. The glasses can also translate text and assist with image captions.

The AI assistant’s capabilities were further showcased by Meta’s CTO Andrew Bosworth, who demonstrated its ability to describe objects and provide translations and summarizations. This test phase is currently limited to a small number of users in the US who opt into the program. These features represent a significant step in wearable AI technology, offering practical, everyday applications for users. The Ray-Ban smart glasses’ AI capabilities align with similar features seen in products from Microsoft and Google, indicating a growing trend in integrating AI into wearable technology.

  • Lightning AI lunches Lightning AI Studios, a cloud-based platform designed for deploying AI products at scale

Lightning AI, known for its open-source framework PyTorch Lightning, has launched Lightning AI Studios, an all-in-one cloud-based platform for building and deploying AI products at scale. This platform is designed to fundamentally change how developers work by abstracting non-core activities and providing a single interface for all AI development needs. It offers pre-built templates, scalable CPU to GPU capabilities, and integrated tools, allowing deployment in various environments, including the user’s cloud, Lightning AI’s cloud, or a local GPU cluster.

Founder and CEO William Falcon describes Lightning AI Studios as an operating system for AI developers, akin to a collaborative cloud-based desktop with all necessary apps in one place. This approach is likened to the transition from using a flip phone to an iPhone, signifying a significant evolution in tooling for machine learning. The platform, which requires no setup, enables AI developers to start building AI products with one click on one screen. It includes a marketplace of apps for building, training, fine-tuning, and deploying models. Lightning AI Studios aims to compete with established cloud products like Google Colab and AWS Sagemaker, offering a more advanced, cloud-centric experience. The platform has four pricing tiers, catering to individual students, researchers, hobbyists, professional engineers, startups, teams, and larger organizations requiring enterprise-grade AI. Early access to Lightning AI Studios is available through their website.

  • Google launches MusicFX AI powered web tool allowing users to create instrumental songs from text prompts

Google has launched MusicFX, an experimental AI-powered music composition web tool that enables users to create original instrumental songs from text prompts. Developed using Google’s MusicLM and DeepMind’s watermarking technology SynthID, MusicFX marks a significant step in AI-driven music creation. However, it comes with limitations to protect original artists’ voices and styles, such as not generating music for queries mentioning specific artists or including vocals.

MusicFX is part of Google’s AI Test Kitchen, a platform designed for public interaction with Google’s latest AI technologies. This approach allows Google to develop AI responsibly and inclusively by gathering early feedback. Currently available in the United States, Kenya, New Zealand, and Australia, MusicFX, alongside TextFX, offers users the chance to turn ideas into music or text. Google emphasizes responsible AI development, ensuring data privacy and implementing multiple layers of protection to address potential risks like inaccurate or inappropriate responses. The release of MusicFX not only provides a new tool for music generation but also represents a broader trend in AI, where public involvement is crucial in shaping and refining technology. This initiative could democratize music creation, lowering barriers for those without formal training or access to sophisticated tools, while also navigating the challenges of AI-generated content in terms of copyright and originality.

  • Tenyx launches its Fine-tuning Service, addressing a critical challenge in AI known as catastrophic forgetting

Tenyx has announced the launch of its Fine-tuning Service, addressing a critical challenge in AI known as catastrophic forgetting. This service is designed to help businesses customize and fine-tune Large Language Models (LLMs) for their specific needs without losing foundational knowledge or compromising safety measures. Traditional fine-tuning techniques often lead to the unintentional loss of previously learned capabilities and degradation of safety features established by reinforcement learning from human feedback (RLHF). However, Tenyx’s approach, based on a novel mathematical interpretation of geometric representations formed during initial LLM training, overcomes these drawbacks.

The Tenyx Fine-tuning Service enhances the retention of prior knowledge and reasoning abilities while maintaining RLHF protection. It outperforms popular enterprise and open-source fine-tuning algorithms in safety, proficiency, and knowledge retention. For instance, after fine-tuning, Tenyx showed only an 11% reduction in safety features compared to more significant reductions by other models, and it was 65% more proficient than OpenAI’s GPT 3.5 Turbo. Additionally, Tenyx mitigated catastrophic forgetting with only a 3% loss of knowledge. This service aligns with recent regulatory changes, such as the White House executive order on Safe, Secure, and Trustworthy AI, and is now available for businesses to sign up as a trial, with pricing to be announced in the future.

  • Meta Platforms releases a new AI program called Audiobox for voice cloning and generating ambient sounds

Meta Platforms has released Audiobox, a new AI program for voice cloning and generating ambient sounds. Audiobox, described as a foundation research model for audio generation, builds upon Meta’s previous work in this area with Voicebox. It allows users to generate custom audio for various use cases by combining voice inputs and natural language text prompts. Users can input a sentence for a cloned voice to say, or describe a sound to generate, and Audiobox will execute the task. Additionally, users can record their voice for cloning by Audiobox.

Audiobox represents a significant advancement in audio-generating AI, with Meta creating a family of models for speech mimicry and generating ambient sounds like dogs barking or sirens. These models are built upon the shared self-supervised model Audiobox SSL, a technique where AI algorithms generate their own labels for unlabeled data. The FAIR researchers at Meta trained Audiobox using a vast dataset, including speech, music, and sound samples, to ensure a diverse representation of voices. While Audiobox is not open source, Meta plans to invite researchers and academic institutions to conduct safety and responsibility research with it. Currently, Audiobox is restricted for commercial use and is not available to residents of Illinois or Texas due to state laws. However, with rapid advancements in AI, commercial versions of such technology are expected soon.

  • Essential AI raises $56.5 million in Series A funding backed by Google, NVIDIA and AMD

Essential AI, a San Francisco-based startup, has emerged from stealth with a significant $56.5 million in Series A funding, backed by tech giants Google, Nvidia, and AMD, along with other investors. Co-founded by Ashish Vaswani and Niki Parmar, former collaborators at Google known for co-authoring a research paper on the Transformer architecture, Essential AI aims to revolutionize how people work with AI. The company plans to launch AI products that enhance human-computer collaboration, potentially offering ready-to-use AI solutions for enterprises to boost productivity.

While specific details of Essential AI’s upcoming products are not yet disclosed, the company’s focus is on developing large language model (LLM)-driven, full-stack AI products. These products are expected to automate time-consuming workflows and adapt quickly to increase productivity. The company’s vision includes making data analysts ten times faster and enabling business users to become independent, data-driven decision-makers. With the current trend of companies seeking AI-driven efficiencies, Essential AI’s upcoming offerings, backed by industry leaders, could significantly impact enterprise workflows. The company is actively hiring across various roles to innovate and deliver the best user experience with its enterprise-centric LLM products.

  • The European Union’s AI Act allows open-source LLMs like Mistral to strive in a world dominated by OpenAI

The European Union’s AI Act, a significant piece of legislation aimed at regulating AI technologies, was recently passed following extensive negotiations. However, the Act, which requires high-impact general purpose AI models to adhere to transparency standards and imposes additional requirements on high-risk systems, quickly lost media attention to the achievements of Mistral, a Paris-based AI startup. Mistral, known for its open-source model approach, recently released a new Large Language Model (LLM) with just a torrent link and announced a substantial $415 million funding round, valuing the company at about $2 billion.

Mistral’s activities, including its successful lobbying against certain regulatory measures in the EU AI Act, highlight the ongoing debate over AI sovereignty in the European Union. The Act’s provisions, which offer broad exemptions to open source models except those posing systemic risks, reflect the complex balance between fostering AI innovation and imposing necessary regulations. As AI continues to advance rapidly, regulators worldwide are challenged to keep pace with the technology without stifling innovation, a task underscored by the vibrant AI research community and companies like Mistral that continue to push the boundaries of AI development.

  • Cohere AI launches “build-your-own connectors,” feature allowing businesses to integrate their own data from third-party applications like Slack and Google Drive with Cohere

Cohere, an AI company competing with OpenAI, has launched “build-your-own connectors,” a new feature that allows businesses to integrate their company data from third-party applications like Slack and Google Drive into Cohere’s Command Large Language Model (LLM). This integration, which Cohere claims to be the first of its kind, follows their achievement of being the first AI company to offer fine-tuning across all four major cloud providers. The connectors enable businesses to build AI assistants on Cohere’s platform that can leverage information from a wide range of tools used in daily work, providing contextually relevant and accurate responses.

The introduction of connectors represents a significant advancement in the application of AI in business, allowing enterprise models to securely access company data across numerous third-party applications. This capability is a game-changer for AI’s delivery to businesses, enhancing the accuracy of responses and minimizing errors. Cohere has also released about 100 “quick start connectors” on GitHub for popular applications, and offers full support for other third-party data stores. This development caps a significant year for Cohere, which has seen increased business inquiries and expansion following a fresh round of funding and the opening of a second headquarters in San Francisco.

  • Snapchat Plus subscribers can now generate AI-generated snaps

Snapchat Plus subscribers now have access to new AI tools that allow them to share AI-generated snaps. This feature enhancement enables users to creatively extend a photo by using AI to fill in the surrounding environment. This development is part of Snapchat’s ongoing integration of AI technology into its platform, offering subscribers more innovative ways to interact and create content within the app. The introduction of these AI capabilities reflects the growing trend of social media platforms incorporating advanced technology to enhance user experience and engagement. As AI continues to evolve, it’s likely that Snapchat and similar platforms will continue to explore and integrate these technologies, further transforming the landscape of social media interaction and content creation.

  • News-publishing giant Axel Springer has agreed on a multiyear licensing deal with OpenAI

OpenAI, the creator of ChatGPT, has entered into a multiyear licensing agreement with Axel Springer, a major news-publishing company. This deal marks a significant development in the relationship between media companies and AI developers. OpenAI will pay for using content from Axel Springer’s publications, including Politico and Business Insider in the U.S., and Bild and Welt in Europe, to enhance ChatGPT’s responses and train its AI tools. The financial terms of the agreement have not been disclosed, but it is expected to generate substantial revenue for Axel Springer.

Under this agreement, ChatGPT will include links to the original sources from Axel Springer’s publications when using their information to answer user queries. This new format, which will provide summarized answers, aims to ensure proper credit, compensation, and web traffic for Axel Springer’s websites. Additionally, Axel Springer will have access to OpenAI’s technology to improve its own products. The deal, which is not exclusive, allows Axel Springer to form similar agreements with other generative AI companies. This partnership reflects a growing trend of collaborations between news organizations and AI companies, addressing concerns in the publishing industry about AI technologies using their content without compensation. OpenAI’s approach to compensating publishers, such as using article word count, offers a model for future business interactions in this space.

  • Google Ventures (GV) appoints new general partner to back AI, open source startups

GV, the venture capital firm backed by Google’s parent company Alphabet Inc, has recently appointed Michael McBride, former Chief Revenue Officer at GitLab, as its new general partner. McBride’s role will primarily involve focusing on early-stage startups in the open-source and AI sectors. His experience at GitLab, an open-source developer tools maker and part of GV’s portfolio that went public in 2021, has equipped him with insights into how open-source can significantly benefit startups, particularly in AI.

Despite a slowdown in venture capital funding due to economic factors like interest rate hikes, GV has maintained active investments, averaging $1 billion annually since 2020 and completing 125 investments this year. The firm, known for early-stage investments in companies like Uber and Slack, has also made unique moves in the public market, buying shares of portfolio companies post-IPO. GV, which started as Google Ventures in 2009 and now manages $8 billion in assets solely from Alphabet, has a 35-person investment team focusing on life science and digital sectors, including enterprise, consumer, and frontier tech across North America and Europe. This approach reflects GV’s long-term investment strategy and its flexibility to support companies at various stages of their development.

  • Alphabet announces reduced costs for its AI model Gemini

Alphabet, the parent company of Google, has announced significant cost reductions for Gemini, its most advanced AI model, aiming to attract more developers to its platform. Introduced last Wednesday, Gemini is designed to process various forms of information, including video, audio, and text, and offers more sophisticated reasoning and nuanced understanding than Google’s previous technologies. The cost of using Gemini has been reduced to a quarter or half of its June prices.

To further support developers, Alphabet is releasing a suite of tools for customizing Gemini, along with two new Gemini-powered products. One product focuses on computer programming assistance, and the other on enhancing a company’s security operations. Additionally, a second version of its image-generation model will soon be available to developers. Gemini comes in three versions, each tailored to different processing power needs, with the most powerful version intended for data centers and the smallest optimized for mobile devices. This move by Alphabet is part of its efforts to compete with AI software developed by Microsoft-backed OpenAI, particularly following the launch of ChatGPT about a year ago.

  • Google Cloud partners with Mistral AI on generative language models

Google Cloud has formed a partnership with Paris-based generative AI startup Mistral AI, enabling the distribution of Mistral’s language models on Google’s infrastructure. This collaboration, announced on Wednesday, will see Mistral AI utilizing Google Cloud’s AI-optimized infrastructure, including TPU Accelerators, to enhance the development and scaling of its large language models (LLMs). These models, known for generating text and other content, will benefit from the advanced capabilities and robust security and privacy standards of Google Cloud. Mistral AI, established by former researchers from Meta and Google, recently secured 385 million euros in funding, with contributions from notable investors like Andreessen-Horowitz and LightSpeed Ventures. This partnership and funding round underscore Mistral AI’s growing prominence in the field of AI and its commitment to leveraging Google Cloud’s technology for the advancement of its LLMs.

  • Introducing Ashley, the world’s first AI-powered political campaign caller

Shamaine Daniels, a Democrat running for Congress, is leveraging a novel AI tool named Ashley, developed by Civox, for her campaign against Republican Representative Scott Perry. Ashley, an AI campaign volunteer, is distinct from typical robocallers, offering personalized, generative AI-powered conversations. This technology, capable of engaging thousands of voters simultaneously, represents a new era in political campaigning, enabling high-quality, large-scale interactions. However, it also raises concerns about the potential for spreading disinformation in an already polarized political landscape.

Ashley’s deployment in Daniels’ campaign demonstrates the tool’s ability to analyze voter profiles, tailor conversations, and fluently communicate in over 20 languages, significantly enhancing campaign outreach. Despite its potential, the technology’s use in political campaigns enters a legal gray area with few regulations, particularly concerning AI’s role in elections and the spread of misinformation. Civox’s CEO, Ilya Mouzykantskii, acknowledges these challenges and emphasizes the need for regulatory attention, even as Ashley’s realistic interactions captivate voters like David Fish, who appreciated the AI’s transparency in identifying itself.

  • CEO of DataStax declares Cassandra cloud database the best for Gen AI

Chet Kapoor, CEO of DataStax, confidently declared at a Linux Foundation event that Cassandra, the cloud database based on open-source Apache Cassandra, is the “best f*cking database for gen AI.” This bold statement comes amid intense competition in the generative AI (gen AI) sector, where large language model (LLM) providers and database companies vie for leadership. DataStax’s Cassandra database, widely used by enterprises, has shown early success in deploying generative AI at scale. Kapoor’s assertion is backed by Cassandra’s reliability as an operational database and its technological edge in areas crucial to generative AI, positioning it favorably against competitors like MongoDB and Pinecone.

Kapoor’s comments also highlight the evolving needs of enterprise CIOs, who increasingly prefer integrated data solutions for gen AI applications. While large cloud companies like Microsoft and Amazon have offered a variety of databases for different use cases, generative AI’s emergence has shifted the focus towards single, operational databases like Cassandra. This shift is driven by the need for efficient and easy querying of data by gen AI apps. Cassandra’s popularity as an operational database, especially among Fortune 500 companies, gives it an advantage over other databases primarily used for analytical workloads. Furthermore, Cassandra’s flexibility allows customers to avoid lock-in with specific cloud vendors, as evidenced by Amazon’s offering of Cassandra within its cloud services.

DataStax’s Astra DB, a cloud database-as-a-service based on Cassandra, has already seen deployment of generative AI by several companies. These deployments demonstrate the practical application of gen AI in various sectors, from online education to healthcare. Kapoor cited examples like Physics Wallah, an Indian online education platform, and Skypoint, a healthcare provider, showcasing Astra DB’s role in enhancing productivity and personalization through gen AI. DataStax’s technological prowess, particularly in vector search—a key requirement for gen AI databases—further strengthens its position. Kapoor anticipates rapid adoption of gen AI, predicting transformative and revenue-oriented use cases emerging next year. This trend suggests that gen AI will significantly drive growth for both private and public database companies in the coming years.

About The Author

Bogdan Iancu

Bogdan Iancu is a seasoned entrepreneur and strategic leader with over 25 years of experience in diverse industrial and commercial fields. His passion for AI, Machine Learning, and Generative AI is underpinned by a deep understanding of advanced calculus, enabling him to leverage these technologies to drive innovation and growth. As a Non-Executive Director, Bogdan brings a wealth of experience and a unique perspective to the boardroom, contributing to robust strategic decisions. With a proven track record of assisting clients worldwide, Bogdan is committed to harnessing the power of AI to transform businesses and create sustainable growth in the digital age.