Tess: Hope For The Humanity.

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Tess: Hope for the humanity

Prince Kumar Singh


princesingh@princelab.org

Abstract

Imagine a chatbot that can write stories, poems, songs, and even plays. A
chatbot that is trained on a massive, curated corpus of assistant interactions,
including word problems, multi-turn dialogue, and code. Meet Tess, the
Apache-2 licensed chatbot that builds on the March 2023 Tess release and
surpasses previous models in its performance on creative tasks.

Tess is more than just a chatbot. It is a cutting-edge natural language


generation tool that is fine-tuned on academic data and based on a multi-
model architecture. Tess's ability to generate creative content in various
domains is nothing short of impressive, and its potential applications are
vast.

Whether you are an educator looking for a new way to engage students, a
writer seeking inspiration, or simply a curious individual eager to explore the
possibilities of natural language generation, Tess has something to offer.
With its Apache-2 license, Tess is open and accessible to anyone interested
in exploring the frontiers of artificial intelligence and natural language
processing.

So come and discover the wonder of Tess, the chatbot that writes stories,
poems, songs, and plays, and opens a world of possibilities for natural
language generation.

1. Data collection
In this research paper, we present a diverse sample of questions and
prompts that we gathered by leveraging several publicly available datasets
and curating our own set of prompts. We drew several subsamples from the
LAION OIG dataset, including unified instruction, field hc3 human, unified
multi news, and unified abstract infill. Additionally, we included coding
questions with a random sub-sample of Stack overflow Questions and
instruction-tuning with a sub-sample of Big-science/P3. To further expand the
range of prompts, we also generated our own set of custom creative
questions.

To support our research, we present the 800k point Tess dataset, which is a
superset of the original 400k points Tess dataset. We dedicated substantial
attention to data preparation and curation, ensuring the highest possible
quality of the dataset.

By leveraging this diverse set of prompts and datasets, we aim to contribute


to the development of more robust natural language processing models. Our
approach enables researchers and practitioners to train models on a more
comprehensive range of prompts and to explore the potential of natural
language processing in new and exciting ways.

2. Model Training

In our research, we trained several models that were fine-tuned from both
LLaMA 7B and GPT-J checkpoints. Our initial public release includes a
model that is trained with LoRA on 437,605 post-processed examples for
four epochs, as well as a finetuned GPT-J model that was trained for one
epoch. We provide detailed information about the model hyper-parameters
and training code in the associated repository and model training log.

To support the research community and enable further exploration, we are


releasing both GPT-J and GPT-J LoRa checkpoints. We have also made
updates to the training log, including the additional experiments run for GPT-
J.

By sharing our models and training data, we hope to foster collaboration and
innovation in the field of natural language processing. We believe that our
models represent a significant step forward in the development of robust and
accurate natural language processing tools, and we are excited to see how
they will be used in future research and applications.

3. Evaluation
In our research, we conducted a preliminary evaluation of our model using
the human evaluation data from the Self-Instruct paper (Wang et al., 2022).
We compared the ground truth perplexity of our model against what we
believe to be the best openly available Alpaca-Lora model, which was
provided by user chainyo on Hugging face.

We found that all models, including our own, had high perplexities on a small
number of tasks. To provide more interpretable results, we clipped the
perplexities to a maximum of 100. However, we observed that the models
fine-tuned on our collected dataset exhibited much lower perplexity in the
Self-Instruct evaluation when compared to Alpaca.

It is important to note that this evaluation is not exhaustive, and that further
evaluation work remains to be done. We encourage readers to run the model
locally on CPU and explore the results further. Detailed instructions on how
to do so can be found in our GitHub repository.

4. Deployment
In our research, we developed a powerful natural language processing
model that we believe has the potential to revolutionize the field. However,
in order to fully realize the potential of our model, we needed to deploy it in
a way that would make it accessible and easy to use for researchers,
practitioners, and other stakeholders.

To this end, we utilized Hugging Face Inference to deploy our model on their
CPU cluster. Hugging Face Inference is a powerful platform that provides an
easy-to-use interface for deploying machine learning models, allowing us to
quickly and efficiently deploy our model to a wider audience.

To further improve the user experience and make it easier for users to
interact with our model, we also utilized Gradio.app. Gradio.app is a user-
friendly platform that allows us to build beautiful and intuitive user interfaces
that make it easy for users to interact with our model. With Gradio.app, users
can input their own text, receive output from our model, and view the results
in an intuitive and visually appealing interface.

By deploying our model using Hugging Face Inference and Gradio.app, we


have made it easier for researchers and practitioners to access our model
and utilize its full potential. We believe that this deployment strategy will help
to accelerate research in the field of natural language processing, allowing
researchers to explore new possibilities and push the boundaries of what is
possible with machine learning.

References
Stella Biderman, Hailey Schoelkopf, Quentin An- thony, Herbie Bradley,
Kyle O’Brien, Eric Hal- lahan, Mohammad Aflah Khan, Shivanshu Puro- hit,
USVSN Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika,
and Oskar van der Wal. 2023. Pythia: A suite for analyzing large language
models across training and scaling.

Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal,
Carissa Schoenick, and Oyvind Tafjord. 2018. Think you have solved
question an- swering? try arc, the ai2 reasoning challenge.

Mike Conover, Matt Hayes, Ankit Mathur, Xiangrui Meng, Jianwei Xie, Jun
Wan, Sam Shah, Ali Gh- odsi, Patrick Wendell, Matei Zaharia, and et al.
Free dolly: Introducing the world’s first truly open instruction-tuned llm.

Leo Gao, Jonathan Tow, Stella Biderman, Sid Black, Anthony DiPofi,
Charles Foster, Laurence Gold- ing, Jeffrey Hsu, Kyle McDonell, Niklas
Muen- nighoff, Jason Phang, Laria Reynolds, Eric Tang, Anish Thite, Ben
Wang, Kevin Wang, and Andy Zou. 2021. A framework for few-shot
language model evaluation.

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li,
Shean Wang, Lu Wang, and Weizhu Chen. 2021. Lora: Low-rank
adaptation of large language models.

Todor Mihaylov, Peter Clark, Tushar Khot, and Ashish Sabharwal. 2018.
Can a suit of armor conduct elec- tricity? a new dataset for open book
question an- swering. In EMNLP.

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Car- roll L. Wainwright,
Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex
Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie
Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and
Ryan Lowe. 2022. Training language models to follow instructions with
human feedback.

Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhaga- vatula, and Yejin


Choi. 2019. Winogrande: An ad- versarial winograd schema challenge at
scale.
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne
Lachaux, Timothe ́e Lacroix, Baptiste Rozie`re, Naman Goyal, Eric
Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave,
and Guillaume Lample. 2023. Llama: Open and efficient foundation
language models.

Ben Wang and Aran Komatsuzaki. 2021. GPT- J-6B: A 6 Billion Parameter
Autoregressive Language Model. https://github.com/ kingoflolz/mesh-
transformer-jax.

Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Al- isa Liu, Noah A.
Smith, Daniel Khashabi, and Han- naneh Hajishirzi. 2022. Self-instruct:
Aligning lan- guage model with self generated instructions.

Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi.
2019. Hellaswag: Can a machine really finish your sentence?

You might also like