June 5-11th: Qwen2, Aurora, A Right to Warn, Seed-TTS, Dragonfly, MAP-Neo, Jina CLIP, Mamba-2, Hidden Layers Conference, Recall, Astra, xLSTM, Thousand Brains Project, Chang’e-6, Pi AI Kit, Ultravox (2024)

Table of Contents
⛲Foundational Revelations: Qwen2, Aurora (Superfast Microsoft AI is first to predict air pollution for the whole world) , Seed-TTS: A Family of High-Quality Versatile Speech Generation Models (ByteDance), Dragonfly: A large vision-language model with multi-resolution zoom (Together), MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series, Extracting Concepts from GPT-4 (OpenAI) 🔎 Research: Jina CLIP: Your CLIP Model Is Also Your Text Retriever, Mamba-2 (Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality), LLMs achieve adult human performance on higher-order theory of mind tasks, To Believe or Not to Believe Your LLM, Mobile-Agent-v2 (Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration), Parrot (Multilingual Visual Instruction Tuning), MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark, EGAN: Evolutional GAN for Ransomware Evasion, No Language Left Behind (NLLB): Scaling neural machine translation to 200 languages (Nature), Scientists create world's strongest iron-based superconducting magnet using AI (Phys.org), Gene therapy restores hearing to children with inherited deafness 🖲️AI Art-Research: ✨Murmuring Minds (DRIFT), Voicemod 📚Retroactive Readings: An Introduction to Narrative Generators: How Computers Create Works of Fiction (Perez | Oxford), An Artificial History of Natural Intelligence: Thinking with Machines from Descartes to the Digital Age, (Bates)
June 5-11th: Qwen2, Aurora, A Right to Warn, Seed-TTS, Dragonfly, MAP-Neo, Jina CLIP, Mamba-2, Hidden Layers Conference, Recall, Astra, xLSTM, Thousand Brains Project, Chang’e-6, Pi AI Kit, Ultravox (1)

OpenAI insiders are demanding a “right to warn” the public (Vox) ““I’m scared. I’d be crazy not to be,” one former employee tells Vox. ~ It may be tempting to view the new proposal as just another open letter put out solely by “doomers” who want to press pause on AI because they worry it will go rogue and wipe out all of humanity. That’s not all that this is. The signatories share the concerns of both the “AI ethics” camp, which worries more about present AI harms like racial bias and misinformation, and the “AI safety” camp, which worries more about AI as a future existential risk. ~ These camps are sometimes pitted against each other. The goal of the new proposal is to change the incentives of leading AI companies by making their activities more transparent to outsiders — and that would benefit everyone. ~ The signatories are calling on AI companies to let them voice their concerns about the technology — to the companies’ boards, to regulators, to independent expert organizations, and, if necessary, directly to the public — without retaliation. Six of the signatories are anonymous, including four current and two former OpenAI employees, precisely because they fear being retaliated against. The proposal is endorsed by some of the biggest names in the field: Geoffrey Hinton (often called “the godfather of AI”), Yoshua Bengio, and Stuart Russell.”

A Right to Warn about Advanced Artificial Intelligence “We therefore call upon advanced AI companies to commit to these principles:

1. That the company will not enter into or enforce any agreement that prohibits “disparagement” or criticism of the company for risk-related concerns, nor retaliate for risk-related criticism by hindering any vested economic benefit;

2. That the company will facilitate a verifiably anonymous process for current and former employees to raise risk-related concerns to the company’s board, to regulators, and to an appropriate independent organization with relevant expertise;

3. That the company will support a culture of open criticism and allow its current and former employees to raise risk-related concerns about its technologies to the public, to the company’s board, to regulators, or to an appropriate independent organization with relevant expertise, so long as trade secrets and other intellectual property interests are appropriately protected;

4. That the company will not retaliate against current and former employees who publicly share risk-related confidential information after other processes have failed.”

Top news app in US has Chinese origins and ‘writes fiction’ with the help of AI “LONDON (Reuters) -Last Christmas Eve, NewsBreak, a free app with roots in China that is the most downloaded news app in the United States, published an alarming piece about a small town shooting. NewsBreak, which is headquartered in Mountain View, California and has offices in Beijing and Shanghai, told Reuters it removed the article on December 28, four days after publication. The company said "the inaccurate information originated from the content source," and provided a link to the website, adding: "When NewsBreak identifies any inaccurate content or any violation of our community standards, we take prompt action to remove that content."”

Yes, artificial intelligence is running for mayor of Cheyenne; city, county clerks comment on candidate VIC “VIC, or Virtual Integrated Citizen, promises to be "attuned to the needs and desires of Cheyenne’s residents." But is AI qualified to appear on an election ballot?”

Photoshop Terms of Service grants Adobe access to user projects for ‘content moderation’ – Niche Gamer [Why? Probably to moderate your AI production.] → ✭ How Online Privacy Is Like Fishing “In the wake of a Microsoft spying controversy, it’s time for an ecosystem perspective.”

Introduction - SITUATIONAL AWARENESS: The Decade Ahead “You can see the future first in San Francisco. ~ Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum. ~ The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace many college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be unleashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war. ~ Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the willful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.”

⛲Foundational Revelations: Qwen2, Aurora (Superfast Microsoft AI is first to predict air pollution for the whole world) , Seed-TTS: A Family of High-Quality Versatile Speech Generation Models (ByteDance), Dragonfly: A large vision-language model with multi-resolution zoom (Together), MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series, Extracting Concepts from GPT-4 (OpenAI)

Hello Qwen2 “Pretrained and instruction-tuned models of 5 sizes, including Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B, and Qwen2-72B; Having been trained on data in 27 additional languages besides English and Chinese; State-of-the-art performance in a large number of benchmark evaluations; Significantly improved performance in coding and mathematics; Extended context length support up to 128K tokens with Qwen2-7B-Instruct and Qwen2-72B-Instruct. ~ We have opensourced the models in Hugging Face and ModelScope.”

Aurora (Superfast Microsoft AI is first to predict air pollution for the whole world) “The model, called Aurora, also forecasts global weather for ten days — all in less than a minute.”

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models (ByteDance) “We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and subjective evaluations. With fine-tuning, we achieve even higher subjective scores across these metrics. Seed-TTS offers superior controllability over various speech attributes such as emotion and is capable of generating highly expressive and diverse speech for speakers in the wild. Furthermore, we propose a self-distillation method for speech factorization, as well as a reinforcement learning approach to enhance model robustness, speaker similarity and controllability. We additionally present a non-autoregressive (NAR) variant of the Seed-TTS model, named Seed-TTSDiT, which utilizes a fully diffusion-based architecture. Unlike previous NAR-based TTS systems, Seed-TTSDiT does not depend on pre-estimated phoneme durations and performs speech generation through end-to-end processing. We demonstrate that this variant achieves comparable performance to the language model-based variant in both objective and subjective evaluations and showcase its effectiveness in speech editing.”

Dragonfly: A large vision-language model with multi-resolution zoom (Together) “Dragonfly architecture, which uses multi-resolution zoom-and-select to enhance multi-modal reasoning while being context-efficient. We are also launching two new open-source models Llama-3-8b-Dragonfly-v1 a general-domain model trained on 5.5 million image-instruction pairs and Llama-3-8b-Dragonfly-Med-v1 finetuned on additional 1.4 biomedical image-instruction data. Dragonfly demonstrates promising performance on vision-language benchmarks like commonsense visual QA and image captioning. Dragonfly-Med outperforms prior models, including Med-Gemini on multiple medical imaging tasks, showcasing its capabilities for high-resolution medical data.”

MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series “Large Language Models (LLMs) have made great strides in recent years to achieve unprecedented performance across different tasks. However, due to commercial interest, the most competitive models like GPT, Gemini, and Claude have been gated behind proprietary interfaces without disclosing the training details. Recently, many institutions have open-sourced several strong LLMs like LLaMA-3, comparable to existing closed-source LLMs. However, only the model's weights are provided with most details (e.g., intermediate checkpoints, pre-training corpus, and training code, etc.) being undisclosed. To improve the transparency of LLMs, the research community has formed to open-source truly open LLMs (e.g., Pythia, Amber, OLMo), where more details (e.g., pre-training corpus and training code) are being provided. These models have greatly advanced the scientific study of these large models including their strengths, weaknesses, biases and risks. However, we observe that the existing truly open LLMs on reasoning, knowledge, and coding tasks are still inferior to existing state-of-the-art LLMs with similar model sizes. To this end, we open-source MAP-Neo, a highly capable and transparent bilingual language model with 7B parameters trained from scratch on 4.5T high-quality tokens. Our MAP-Neo is the first fully open-sourced bilingual LLM with comparable performance compared to existing state-of-the-art LLMs. Moreover, we open-source all details to reproduce our MAP-Neo, where the cleaned pre-training corpus, data cleaning pipeline, checkpoints, and well-optimized training/evaluation framework are provided. Finally, we hope our MAP-Neo will enhance and strengthen the open research community and inspire more innovations and creativities to facilitate the further improvements of LLMs.”

Extracting Concepts from GPT-4 (OpenAI) “Using new techniques for scaling sparse autoencoders, we automatically identified 16 million patterns in GPT-4's computations. ~ We currently don't understand how to make sense of the neural activity within language models. Today, we are sharing improved methods for finding a large number of "features"—patterns of activity that we hope are human interpretable. Our methods scale better than existing work, and we use them to find 16 million features in GPT-4. We are sharing a paper(opens in a new window), code(opens in a new window), and feature visualizations(opens in a new window) with the research community to foster further exploration.”

AI Kit - Raspberry Pi Documentation

Chang’e-6: Moon samples collected and launched into lunar orbit “Chang’e-6: Moon samples collected and launched into lunar orbit Material from the far side of the moon has begun its journey for Earth after Chinese spacecraft collected samples and launched them into lunar orbit.”

Thousand Brains Project | Numenta “The Thousand Brains Project is an open-source initiative dedicated to creating a new type of artificial intelligence based on the Thousand Brains Theory.”

GitHub - NX-AI/xlstm: Official repository of the xLSTM. “xLSTM is a new Recurrent Neural Network architecture based on ideas of the original LSTM. Through Exponential Gating with appropriate normalization and stabilization techniques and a new Matrix Memory it overcomes the limitations of the original LSTM and shows promising performance on Language Modeling when compared to Transformers or State Space Models.”

From Scratch - Generative Adversarial Networks

Project Astra - Google DeepMind “A universal AI agent that is helpful in everyday life.”

Hidden Layers AI & Design Conference (June 12 – June 15, 2024)

This Hacker Tool Extracts All the Data Collected by Windows’ New Recall AI (WIRED) “Windows Recall takes a screenshot every five seconds. Cybersecurity researchers say the system is simple to abuse-and one ethical hacker has already built a tool to show how easy it really is.”

AI in software engineering at Google: Progress and the path ahead “Progress of AI-based assistance for software engineering in Google’s internal tooling and our projections for the future.”

DeepFake-o-meter “An Open Platform Integrating State-Of-The-Art Algorithms for DeepFake Image, Video, and Audio Detection”

Selling Data for AI May Be Publishers’ Salvation — The Information

BOXVIA (Bayesian Optimization Excutable and Visualizable Application) “Bayesian Optimization Executable and Visualizable Application (BOXVIA) is a GUI-based application for Bayesian optimization. By using BOXVIA, users can perform Bayesian optimization and visualize functions obtained from the optimization process (i.e. mean function, its standard deviation, and acquisition function) without construction of a computing environment and programming skills. BOXVIA offers significant help for incorporating Bayesian optimization into your optimization problem.”

Microsoft Will Switch Off Recall by Default After Security Backlash “After weeks of withering criticism and exposed security flaws, Microsoft has vastly scaled back its ambitions for Recall, its AI-enabled silent recording feature, and added new privacy features.”

AMD's Newest Open-Source Surprise: "Peano" - An LLVM Compiler For Ryzen AI NPUs - Phoronix “an open-source LLVM compiler back-end for AMD/Xilinx AI engine processors with a particular focus on the Ryzen AI SOCs with existing Phoenix and Hawk Point hardware as well as the upcoming XDNA2 found with the forthcoming Ryzen AI 300 series. AMD's Ryzen AI NPU on Linux is finally getting interesting!”

Promptframes: Evolving the Wireframe for the Age of AI “Promptframes enhance wireframes with prompt writing and generative AI, boosting content fidelity and speeding up user testing. No more lorem ipsum.”

GitHub - ggozad/oterm: a text-based terminal client for Ollama “a text-based terminal client for Ollama. Contribute to ggozad/oterm development by creating an account on GitHub.’

CodeAid: A classroom deployment of an LLM-based coding assistant “We designed an AI tool to help students but without telling them the solution.”

GitHub - fixie-ai/ultravox “Ultravox is a new kind of multimodal LLM that can understand text as well as human speech, without the need for a separate Audio Speech Recognition (ASR) stage. Building on research like AudioLM, SeamlessM4T, Gazelle, SpeechGPT, and others, we've extended Meta's Llama 3 model with a multimodal projector that converts audio directly into the high-dimensional space used by Llama 3. This direct coupling allows Ultravox to respond much more quickly than systems that combine separate ASR and LLM components. In the future this will also allow Ultravox to natively understand the paralinguistic cues of timing and emotion that are omnipresent in human speech. ~ The current version of Ultravox (v0.1), when invoked with audio content, has a time-to-first-token (TTFT) of approximately 200ms, and a tokens-per-second rate of ~100, all using a Llama 3 8B backbone. While quite fast, we believe there is considerable room for improvement in these numbers. ~ Ultravox currently takes in audio and emits streaming text. As we evolve the model, we'll train it to be able to emit a stream of speech tokens that can then be converted directly into raw audio by an appropriate unit vocoder. We're interested in working with interested parties to build this functionality!”

GitHub - SilasMarvin/lsp-ai: LSP-AI is an open-source language server that serves as a backend for AI-powered functionality, designed to assist and empower software engineers, not replace them.

Apple debuts new ‘Apple Intelligence’ AI features at WWDC 2024 | Apple | The Guardian “Tim Cook, the Apple CEO, announced a series of generative artificial intelligence products and services on Monday during his keynote speech at the company’s annual developer conference, WWDC, including a deal with ChatGPT-maker OpenAI. The new tools mark a major shift toward AI for Apple, which has seen slowing global sales over the past year and integrated fewer AI features into its consumer-facing products than competitors. “It has to understand you and be grounded in your personal context like your routine, your relationships, your communications and more. It’s beyond artificial intelligence. It’s personal intelligence,” said Cook. “Introducing Apple Intelligence.” ~ Apple’s new artificial intelligence system involves a range of generative AI tools aimed at creating an automated, personalized experience on its devices. The demonstration showed Apple’s AI would be integrated throughout the operating systems on its laptops, iPads and iPhones, as well as be able to pull information from and take action within apps. ~ The company also confirmed its much-anticipated partnership with OpenAI during the keynote, announcing that Apple would integrate ChatGPT technology into responses from Siri, its AI assistant.”

Claude’s ‘Character Training’ (Anthropic) “Companies developing AI models generally train them to avoid saying harmful things and to avoid assisting with harmful tasks. The goal of this is to train models to behave in ways that are "harmless". But when we think of the character of those we find genuinely admirable, we don’t just think of harm avoidance. We think about those who are curious about the world, who strive to tell the truth without being unkind, and who are able to see many sides of an issue without becoming overconfident or overly cautious in their views. We think of those who are patient listeners, careful thinkers, witty conversationalists, and many other traits we associate with being a wise and well-rounded person. ~ AI models are not, of course, people. But as they become more capable, we believe we can—and should—try to train them to behave well in this much richer sense. Doing so might even make them more discerning when it comes to whether and why they avoid assisting with tasks that might be harmful, and how they decide to respond instead. ~ Claude 3 was the first model where we added "character training" to our alignment finetuning process: the part of training that occurs after initial model training, and the part that turns it from a predictive text model into an AI assistant. The goal of character training is to make Claude begin to have more nuanced, richer traits like curiosity, open-mindedness, and thoughtfulness.”

“NPUs & TPUs! Isn’t this naming confusing? Yes! TPU has become Google’s name for all its cloud AI accelerator chips including its later designs that perform training as well as inference. Google has also used TPU as the name for the NPU in its Pixel smartphone. Elsewhere on the cloud, other firms use a variety of names for their specialized AI accelerators, adding to the confusion. ~ Thankfully, the term NPU is being more consistently applied - by Intel, Apple, Qualcomm, AMD, and others - as the term for specialized inference hardware. ~ Where would I expect to find an NPU? We’ve already seen that we can find an NPU in an AI PC. It’s also found in the other locations that are sometimes known as ‘the edge’ as opposed to ‘the cloud’. Or more prosaically, on your personal computer, smartphone, or ‘Internet of Things’ device. ~ So the NPU is a Chip? Not usually. Unlike, Google’s TPUs, NPUs aren’t normally separate chips. Instead, they are a distinct block of circuits on a System-on-Chip, the single silicon die that contains all of the processing capacity needed to power a modern personal computer or smartphone.”

June 5-11th: Qwen2, Aurora, A Right to Warn, Seed-TTS, Dragonfly, MAP-Neo, Jina CLIP, Mamba-2, Hidden Layers Conference, Recall, Astra, xLSTM, Thousand Brains Project, Chang’e-6, Pi AI Kit, Ultravox (2)

🔎 Research: Jina CLIP: Your CLIP Model Is Also Your Text Retriever, Mamba-2 (Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality), LLMs achieve adult human performance on higher-order theory of mind tasks, To Believe or Not to Believe Your LLM, Mobile-Agent-v2 (Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration), Parrot (Multilingual Visual Instruction Tuning), MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark, EGAN: Evolutional GAN for Ransomware Evasion, No Language Left Behind (NLLB): Scaling neural machine translation to 200 languages (Nature), Scientists create world's strongest iron-based superconducting magnet using AI (Phys.org), Gene therapy restores hearing to children with inherited deafness

[2405.20204] Jina CLIP: Your CLIP Model Is Also Your Text Retriever “Contrastive Language-Image Pretraining (CLIP) is widely used to train models to align images and texts in a common embedding space by mapping them to fixed-sized vectors. These models are key to multimodal information retrieval and related tasks. However, CLIP models generally underperform in text-only tasks compared to specialized text models. This creates inefficiencies for information retrieval systems that keep separate embeddings and models for text-only and multimodal tasks. We propose a novel, multi-task contrastive training method to address this issue, which we use to train the jina-clip-v1 model to achieve the state-of-the-art performance on both text-image and text-text retrieval tasks.”

[2405.21060] Mamba-2 (Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality) “While Transformers have been the main architecture behind deep learning's success in language modeling, state-space models (SSMs) such as Mamba have recently been shown to match or outperform Transformers at small to medium scale. We show that these families of models are actually quite closely related, and develop a rich framework of theoretical connections between SSMs and variants of attention, connected through various decompositions of a well-studied class of structured semiseparable matrices. Our state space duality (SSD) framework allows us to design a new architecture (Mamba-2) whose core layer is an a refinement of Mamba's selective SSM that is 2-8X faster, while continuing to be competitive with Transformers on language modeling.”

[2405.18870] LLMs achieve adult human performance on higher-order theory of mind tasks “This paper examines the extent to which large language models (LLMs) have developed higher-order theory of mind (ToM); the human ability to reason about multiple mental and emotional states in a recursive manner (e.g. I think that you believe that she knows). This paper builds on prior work by introducing a handwritten test suite -- Multi-Order Theory of Mind Q&A -- and using it to compare the performance of five LLMs to a newly gathered adult human benchmark. We find that GPT-4 and Flan-PaLM reach adult-level and near adult-level performance on ToM tasks overall, and that GPT-4 exceeds adult performance on 6th order inferences. Our results suggest that there is an interplay between model size and finetuning for the realisation of ToM abilities, and that the best-performing LLMs have developed a generalised capacity for ToM. Given the role that higher-order ToM plays in a wide range of cooperative and competitive human behaviours, these findings have significant implications for user-facing LLM applications.”

To Believe or Not to Believe Your LLM “We explore uncertainty quantification in large language models (LLMs), with the goal to identify when uncertainty in responses given a query is large. We simultaneously consider both epistemic and aleatoric uncertainties, where the former comes from the lack of knowledge about the ground truth (such as about facts or the language), and the latter comes from irreducible randomness (such as multiple possible answers). In particular, we derive an information-theoretic metric that allows to reliably detect when only epistemic uncertainty is large, in which case the output of the model is unreliable. This condition can be computed based solely on the output of the model obtained simply by some special iterative prompting based on the previous responses. Such quantification, for instance, allows to detect hallucinations (cases when epistemic uncertainty is high) in both single- and multi-answer responses. This is in contrast to many standard uncertainty quantification strategies (such as thresholding the log-likelihood of a response) where hallucinations in the multi-answer case cannot be detected. We conduct a series of experiments which demonstrate the advantage of our formulation. Further, our investigations shed some light on how the probabilities assigned to a given output by an LLM can be amplified by iterative prompting, which might be of independent interest.”

[2406.01014] Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration “Mobile device operation tasks are increasingly becoming a popular multi-modal AI application scenario. Current Multi-modal Large Language Models (MLLMs), constrained by their training data, lack the capability to function effectively as operation assistants. Instead, MLLM-based agents, which enhance capabilities through tool invocation, are gradually being applied to this scenario. However, the two major navigation challenges in mobile device operation tasks, task progress navigation and focus content navigation, are significantly complicated under the single-agent architecture of existing work. This is due to the overly long token sequences and the interleaved text-image data format, which limit performance. To address these navigation challenges effectively, we propose Mobile-Agent-v2, a multi-agent architecture for mobile device operation assistance. The architecture comprises three agents: planning agent, decision agent, and reflection agent. The planning agent generates task progress, making the navigation of history operations more efficient. To retain focus content, we design a memory unit that updates with task progress. Additionally, to correct erroneous operations, the reflection agent observes the outcomes of each operation and handles any mistakes accordingly. Experimental results indicate that Mobile-Agent-v2 achieves over a 30% improvement in task completion compared to the single-agent architecture of Mobile-Agent. The code is open-sourced at https://github.com/X-PLUG/MobileAgent

[2406.02539] Parrot: Multilingual Visual Instruction Tuning “The rapid development of Multimodal Large Language Models (MLLMs) like GPT-4V has marked a significant step towards artificial general intelligence. Existing methods mainly focus on aligning vision encoders with LLMs through supervised fine-tuning (SFT) to endow LLMs with multimodal abilities, making MLLMs' inherent ability to react to multiple languages progressively deteriorate as the training process evolves. We empirically find that the imbalanced SFT datasets, primarily composed of English-centric image-text pairs, lead to significantly reduced performance in non-English languages. This is due to the failure of aligning the vision encoder and LLM with multilingual tokens during the SFT process. In this paper, we introduce Parrot, a novel method that utilizes textual guidance to drive visual token alignment at the language level. Parrot makes the visual tokens condition on diverse language inputs and uses Mixture-of-Experts (MoE) to promote the alignment of multilingual tokens. Specifically, to enhance non-English visual tokens alignment, we compute the cross-attention using the initial visual features and textual embeddings, the result of which is then fed into the MoE router to select the most relevant experts. The selected experts subsequently convert the initial visual tokens into language-specific visual tokens. Moreover, considering the current lack of benchmarks for evaluating multilingual capabilities within the field, we collect and make available a Massive Multilingual Multimodal Benchmark which includes 6 languages, 15 categories, and 12,000 questions, named as MMMB. Our method not only demonstrates state-of-the-art performance on multilingual MMBench and MMMB, but also excels across a broad range of multimodal tasks. Both the source code and the training dataset of Parrot will be made publicly available.”

[2406.01574] MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark “In the age of large-scale language models, benchmarks like the Massive Multitask Language Understanding (MMLU) have been pivotal in pushing the boundaries of what AI can achieve in language comprehension and reasoning across diverse domains. However, as models continue to improve, their performance on these benchmarks has begun to plateau, making it increasingly difficult to discern differences in model capabilities. This paper introduces MMLU-Pro, an enhanced dataset designed to extend the mostly knowledge-driven MMLU benchmark by integrating more challenging, reasoning-focused questions and expanding the choice set from four to ten options. Additionally, MMLU-Pro eliminates the trivial and noisy questions in MMLU. Our experimental results show that MMLU-Pro not only raises the challenge, causing a significant drop in accuracy by 16% to 33% compared to MMLU but also demonstrates greater stability under varying prompts. With 24 different prompt styles tested, the sensitivity of model scores to prompt variations decreased from 4-5% in MMLU to just 2% in MMLU-Pro. Additionally, we found that models utilizing Chain of Thought (CoT) reasoning achieved better performance on MMLU-Pro compared to direct answering, which is in stark contrast to the findings on the original MMLU, indicating that MMLU-Pro includes more complex reasoning questions. Our assessments confirm that MMLU-Pro is a more discriminative benchmark to better track progress in the field.”

New ransomware attack based on an evolutional generative adversarial network can evade security measures “a new approach to produce adversarial ransomware samples, which they term evolution generative adversarial network (EGAN). This method was found to generate ransomware that could successfully evade numerous commercial AI-powered anti-virus solutions and malware detection methods.” ~ ✭EGAN: Evolutional GAN for Ransomware Evasion | IEEE Conference Publication | IEEE Xplore “Adversarial Training is a proven defense strategy against adversarial malware. However, generating adversarial malware samples for this type of training presents a challenge because the resulting adversarial malware needs to remain evasive and functional. This work proposes an attack framework, EGAN, to address this limitation. EGAN leverages an Evolution Strategy and Generative A dversarial Network to select a sequence of attack actions that can mutate a Ransonware file while preserving its original functionality. We tested this framework on popular AI-powered commercial antivirus systems listed on VirusTotal and demonstrated that our framework is capable of bypassing the majority of these systems. Moreover, we evaluated whether the EGAN attack framework can evade other commercial non-AI antivirus solutions. Our results indicate that the adversarial ransonware generated can increase the probability of evading some of them.”

Meta's AI can translate dozens of under-resourced languages “Marta Costa-jussà and the No Language Left Behind (NLLB) team have developed a cross-language approach, which allows neural machine translation models to learn how to translate low-resource languages using their pre-existing ability to translate high-resource languages. ~ As a result, the researchers have developed an online multilingual translation tool, called NLLB-200, that includes 200 languages, contains three times as many low-resource languages as high-resource languages, and performs 44% better than pre-existing systems. ~ Given that the researchers only had access to 1,000–2,000 samples of many low-resource languages, to increase the volume of training data for NLLB-200 they utilized a language identification system to identify more instances of those given dialects. The team also mined bilingual textual data from Internet archives, which helped improve the quality of translations NLLB-200 provided.” ~ ✭No Language Left Behind (NLLB): Scaling neural machine translation to 200 languages (Nature) “The development of neural techniques has opened up new avenues for research in machine translation. Today, neural machine translation (NMT) systems can leverage highly multilingual capacities and even perform zero-shot translation, delivering promising results in terms of language coverage and quality. However, scaling quality NMT requires large volumes of parallel bilingual data, which are not equally available for the 7,000+ languages in the world1. Focusing on improving the translation qualities of a relatively small group of high-resource languages comes at the expense of directing research attention to low-resource languages, exacerbating digital inequities in the long run. To break this pattern, here we introduce No Language Left Behind—a single massively multilingual model that leverages transfer learning across languages. We developed a conditional computational model based on the Sparsely Gated Mixture of Experts architecture which we trained on data obtained with new mining techniques tailored for low-resource languages. Furthermore, we devised multiple architectural and training improvements to counteract overfitting while training on thousands of tasks. We evaluated the performance of our model over 40,000 translation directions using tools created specifically for this purpose—an automatic benchmark (FLORES-200), a human evaluation metric (XSTS) and a toxicity detector that covers every language in our model. Compared with the previous state-of-the-art models, our model achieves an average of 44% improvement in translation quality as measured by BLEU. By demonstrating how to scale NMT to 200 languages and making all contributions in this effort freely available for non-commercial use, our work lays important groundwork for the development of a universal translation system.”

Scientists create world's strongest iron-based superconducting magnet using AI (Phys.org) “Scientists have developed the world's strongest iron-based superconducting magnet using AI, in what could be a breakthrough for affordable MRI machines and the future of electrified transport. ... Using a new machine learning system called BOXVIA, the scientists developed a framework that could optimize superconductor creation in the lab faster than ever before.”~ ✭ Superstrength permanent magnets with iron-based superconductors by data- and researcher-driven process design (NPG Asia Material) “Iron-based high-temperature (high-Tc) superconductors have good potential to serve as materials in next-generation superstrength quasi-permanent magnets owing to their distinctive topological and superconducting properties. However, their unconventional high-Tc superconductivity paradoxically associates with anisotropic pairing and short coherence lengths, causing challenges by inhibiting supercurrent transport at grain boundaries in polycrystalline materials. In this study, we employ machine learning to manipulate intricate polycrystalline microstructures through a process design that integrates researcher- and data-driven approaches via tailored software. Our approach results in a bulk Ba0.6K0.4Fe2As2 permanent magnet with a magnetic field that is 2.7 times stronger than that previously reported. Additionally, we demonstrate magnetic field stability exceeding 0.1 ppm/h for a practical 1.5 T permanent magnet, which is a vital aspect of medical magnetic resonance imaging. Nanostructural analysis reveals contrasting outcomes from data- and researcher-driven processes, showing that high-density defects and bipolarized grain boundary spacing distributions are primary contributors to the magnet’s exceptional strength and stability.”

Leveraging computer vision for predicting collision risks: a cross-sectional analysis of 2019–2021 fatal collisions in the USA | Injury Prevention :This study demonstrates the utility of using data algorithms that can automatically analyse street segments to create indicators of the built environment to enhance understanding of large-scale patterns and inform interventions to decrease road traffic injuries and fatalities.”

Bilateral gene therapy in children with autosomal recessive deafness 9: single-arm trial results - Nature Medicine “An interim analysis of a single-arm trial in 5 children with hereditary deafness shows that binaural AAV gene therapy is safe and leads to hearing improvement up to 13–26 weeks of follow-up.” ✭ Gene therapy restores hearing to children with inherited deafness “The first clinical trial to administer gene therapy to both ears in one person has restored hearing function to 5 children born with a form of inherited.”

‘Everything is Going to Be Robotic’ Nvidia Promises, as AI Gets More Real

A conversation with NVIDIA’s Jensen Huang - YouTube "We can't design a chip anymore without AI."

Why apathy and fear are the two most useless positions on AI (Ethan Mollick on Big Think) “AI is reshaping our understanding of humanity and intelligence, evolving from simple prediction tools to sophisticated large language models, but how do we keep it from dooming us all? Should we be more afraid of it, or are we actually in control? Mollick proposes four most likely predictions of our future with AI – As Good As It Gets, Slow Growth, Exponential Growth, and The Machine God – and explains the likelihood and potential results of each one. ~ Mollick stresses the importance of using AI as a supplemental tool to enhance your performance, not as something that will replace you entirely. According to Mollick, AI is here to stay, and it’s up to us to decide how it is used now, and in generations to come. Our choices today will shape the trajectory of AI and determine whether it becomes a force for good or a source of existential risk.”

🖲️AI Art-Research: ✨Murmuring Minds (DRIFT), Voicemod

✨“Murmuring Minds” is a new interactive performance installation from DRIFT (@studio.drift) on view now at LUMA Arles (@luma_arles) as part… | Instagram “✨“Murmuring Minds” is a new interactive performance installation from DRIFT (@studio.drift) on view now at LUMA Arles (@luma_arles) as part of the studio’s current exhibition there, “Living Landscape.” The installation “Murmuring Minds” has been in development for the last three years, and explores the intricate patterns governing movement and processes in nature. Within the space, sixty autonomously-moving rectangular blocks act as a swarm, executing specific behaviors. As the audience engages with the installation, “Murmuring Minds” transforms into a dynamic performance, highlighting the unique interplay between human participants and the blocks. Each movement and decision visibly impacts the composition and reactions of the blocks, demonstrating the complexity of decision-making processes and blurring the lines between the natural and artificial.”

Free Real Time Voice Changer for PC & Mac - Voicemod “Express yourself with our real-time AI Voice Changer and soundboard to be who you want, when you want in the metaverse. Build your sonic identity for platforms like Roblox, OBS, VRChat, Discord, and more. ~ Voicemod adds real-time voice changing and custom sound effects to every game and communication desktop app including Discord, ZOOM, Google Meet, Minecraft, Lethal Company, Overwatch, Rust, Fortnite, Valorant, League of Legends, Among Us, Roll20, Escape from Tarkov, WhatsApp Desktop, Gorilla Tag, and more!”

Coral “Coral is a complete toolkit to build products with local AI. Our on-device inferencing capabilities allow you to build products that are efficient, private, fast and offline.”

Hailo: The World’s Top Performing Edge AI Processor For Edge Devices “Our processors are geared towards the new era of generative AI on the edge, in parallel to enabling perception and video enhancement through our wide range of AI accelerators and vision processors.”

I Built a CoPilot+ AI PC (without Windows) - YouTube “The new AI Kit is $70 and you can find more on Raspberry Pi's website.”

NVIDIA Jetson AGX Orin “Next-level AI performance for next-gen robotics.”

📚Retroactive Readings: An Introduction to Narrative Generators: How Computers Create Works of Fiction (Perez | Oxford), An Artificial History of Natural Intelligence: Thinking with Machines from Descartes to the Digital Age, (Bates)

An Introduction to Narrative Generators: How Computers Create Works of Fiction | Oxford Academic “This book describes how computer programs can generate narratives and how studies of computational narrative can illuminate how humans tell stories. It is designed for readers with little or no background in computer science but who are interested in understanding the core processes underlying AI systems. We refer to this phenomenon as the AI knowledge gap. This book contributes to filling the AI knowledge gap in the field of automatic narrative generation and to enhancing the dissemination of information about automatic storytelling. The book introduces the most relevant techniques employed over the past 60 years for the development of computer models for narrative generation, avoiding, as much as possible, the use of technical language. The techniques studied are narrative templates, problem-solving, planning, author engagement and reflection, and statistical methods such as deep neural networks. Throughout the book, we offer introductions to relevant concepts related to automatic storytelling, followed by descriptions of well-known computer programs that illustrate how such concepts are employed. The book compares ways that researchers have characterized the automatic generation of narratives and covers the core properties that distinguish this area of knowledge. In the final chapter, we reflect on some of the implications for society from the development of automatic narrative generator systems.”

An Artificial History of Natural Intelligence: Thinking with Machines from Descartes to the Digital Age, (Bates) “A new history of human intelligence that argues that humans know themselves by knowing their machines. ~ We imagine that we are both in control of and controlled by our bodies—autonomous and yet automatic. This entanglement, according to David W. Bates, emerged in the seventeenth century when humans first built and compared themselves with machines. Reading varied thinkers from Descartes to Kant to Turing, Bates reveals how time and time again technological developments offered new ways to imagine how the body’s automaticity worked alongside the mind’s autonomy. Tracing these evolving lines of thought, An Artificial History of Natural Intelligence offers a new theorization of the human as a being that is dependent on technology and produces itself as an artificial automaton without a natural, outside origin.”

Cyborg computer with living brain organoid aces machine learning tests (New Atlas) “Scientists have grown a tiny brain-like organoid out of human stem cells, hooked it up to a computer, and demonstrated its potential as a kind of organic machine learning chip, showing it can quickly pick up speech recognition and math predictions.”

Living brain-cell biocomputers are now training on dopamine (New Atlas) “Current AI training methods burn colossal amounts of energy to learn, but the human brain sips just 20 W. Swiss startup FinalSpark is now selling access to cyborg biocomputers, running up to four living human brain organoids wired into silicon chips. ~ For FinalSpark's Neuroplatform, brain organoids comprising about 10,000 living neurons are grown from stem cells. These little balls, about 0.5 mm (0.02 in) in diameter, are kept in incubators at around body temperature, supplied with water and nutrients and protected from bacterial or viral contamination, and they're wired into an electrical circuit with a series of tiny electrodes. … The FinalSpark team uses smaller organoids, wired into arrays, and it also adds a new wrinkle, in the ability to flood the organoids with reward hormones like dopamine when they've done a good job. ~ "We encapsulate dopamine in a molecular cage, invisible to the organoid initially," co-founder Dr Fred Jordan told Techopedia last year. "When we want to ‘reward’ the organoid, we expose it to specific light frequencies. This light opens the cage, releasing the dopamine and providing the intended stimulus to the organoid.” ~ It's an absolutely bizarre frontier of research, and it certainly makes some people uncomfortable. But Jordan points out that humans have long harnessed living things to do work, be it the yeast that brews our beer or the horses that pulled ploughs through our fields.””

Muotri Lab

Organoids merge to model the blood–brain barrier “Combining a brain organoid with a blood-vessel organoid yields a system similar to a protective mesh in the brain.”

Brain organoids and organoid intelligence from ethical, legal, and social points of view “Human brain organoids, aka cerebral organoids or earlier “mini-brains”, are 3D cellular models that recapitulate aspects of the developing human brain. They show tremendous promise for advancing our understanding of neurodevelopment and neurological disorders. However, the unprecedented ability to model human brain development and function in vitro also raises complex ethical, legal, and social challenges. Organoid Intelligence (OI) describes the ongoing movement to combine such organoids with Artificial Intelligence to establish basic forms of memory and learning.”

June 5-11th: Qwen2, Aurora, A Right to Warn, Seed-TTS, Dragonfly, MAP-Neo, Jina CLIP, Mamba-2, Hidden Layers Conference, Recall, Astra, xLSTM, Thousand Brains Project, Chang’e-6, Pi AI Kit, Ultravox (2024)
Top Articles
Latest Posts
Article information

Author: Ouida Strosin DO

Last Updated:

Views: 6158

Rating: 4.6 / 5 (56 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Ouida Strosin DO

Birthday: 1995-04-27

Address: Suite 927 930 Kilback Radial, Candidaville, TN 87795

Phone: +8561498978366

Job: Legacy Manufacturing Specialist

Hobby: Singing, Mountain biking, Water sports, Water sports, Taxidermy, Polo, Pet

Introduction: My name is Ouida Strosin DO, I am a precious, combative, spotless, modern, spotless, beautiful, precious person who loves writing and wants to share my knowledge and understanding with you.