The Claude 3 Model Family : Opus, Sonnet, Haiku
The latest family of multi-modal models by Anthropic is now generally available in 159 countries. All models of Claude 3 now have improved capabilities in analysis, forecasting, nuanced content creation, code generation, and conversing in Spanish, Japanese, and French.
(website)
Pearl: A Review-driven Persona-Knowledge Grounded Conversational Recommendation Dataset
Conversational recommender systems are gaining popularity, but the public datasets available to train them lack specific user preferences and explanations for recommendations, leading to low-quality recommendations. To address this, the authors of this paper have created PEARL, a novel conversational recommendation dataset with detailed persona and knowledge from real-world reviews. With over 57k dialogues, PEARL provides more specific user preferences and shows expertise in the target domain, resulting in more relevant recommendations.
Retrieval-Augmented Generation for AI-Generated Content: A Survey
Recent advancements in model algorithms, architectures, and high-quality datasets have facilitated the creation of AI-generated content. The latest trend in the industry is Retrieval-Augmented Generation (RAG), a process that optimises the content generated by Large Language Models (LLMs) by referencing external knowledge bases, i.e. datasets that have not being used for training, for instance, proprietary company datasets or documents. This study reviews existing efforts to integrate RAG into content generation scenarios, providing an overview of practical applications, benchmarks and limitations of current RAG systems.
(GitHub)
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Sora, the latest AI model released by OpenAI, has recently come to the forefront for its ability to generate realistic or imaginative videos from text instructions. This paper reviews Sora's background, technologies, applications, challenges, and future directions.
(GitHub)
Generative AI for Synthetic Data Generation: Methods, Challenges and the Future
The application of LLMs to specialised domains requires domain-specific data, which is often not readily available or open to the public. To bridge this gap, recent research has focused on generating synthetic data from LLMs. This paper explores advanced technologies that utilize these massive LLMs for generating task-specific training data and discusses their methods, evaluation techniques, practical applications, limitations, and potential areas for future research.
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Training Large Language Models (LLMs) can be challenging due to the increasing size of weights and optimizer states. The latest research paper by NVIDIA researchers proposes Gradient Low-Rank Projection (GaLore), which reduces memory usage by up to 65.5% without compromising efficiency or performance. The researchers have tested GaLore on pre-training LLaMA 1B and 7B architectures with C4 dataset, containing up to 19.7B tokens, and on fine-tuning RoBERTa on GLUE tasks. Their experiments demonstrate that it is now possible to pre-train a 7B model on consumer GPUs with 24GB memory without model parallel, checkpointing, or offloading strategies.