PaLM 2 is a large language model developed by Google, introduced in May 2023, which is a successor to the original PaLM model. It is a state-of-the-art language model with improved multilingual, reasoning, and coding capabilities. PaLM 2 is more heavily trained on multilingual text, spanning over 100 languages, which significantly improves its ability to understand, generate, and translate nuanced text across a wide variety of languages. It demonstrates improved capabilities in logic, common sense reasoning, and mathematics. PaLM 2 was pre-trained on a large quantity of publicly available source code datasets, excelling at popular programming languages like Python and JavaScript, but can also generate specialized code in languages like Prolog, Fortran, and Verilog.
PaLM 2 is available in four sizes, named Gecko, Otter, Bison, and Unicorn, from smallest to largest. The lightest version, Gecko, is small enough to run on mobile phones, processing 20 tokens per second, roughly equivalent to around 16 or 17 words. This miniaturization of language models is significant as it has benefits like improved privacy and reduced costs.
PaLM 2 is already being used to power 25 features and products within Google’s own domain, including Bard, the company’s experimental chatbot, and Google Workspace apps like Docs, Slides, and Sheets. It is also being used to power features in Google Workspace apps like Docs, Slides, and Sheets.Google has also introduced Med-PaLM 2, a specialized version of PaLM 2 trained on health data, which can answer questions similar to those found on the US Medical Licensing Examination to an “expert” level. This model will be available via Google Cloud, initially to select customers.
However, there are concerns regarding the legality of training data used to create language models, as it often includes copyright-protected text and pirated ebooks. Tech companies creating these models, including Google, have generally responded by refusing to answer questions about where they source their training data. Additionally, there are problems inherent to the output of language models, such as “hallucinations,” or the tendency of these systems to generate incorrect or misleading information. Google has acknowledged these challenges and is working on improving the accuracy and reliability of its language models