Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Hey all, great to meet you.

Here's the order I'd attempt fixes:

1. Switching the model to gpt-4-1106-preivew will probably solve most instruction

following or hallucination issues.

2. Try more prompt tuning. All OpenAI models tend to rewrite all code

unnecessarily. Tell them a few times in the prompt not to rewrite code from

documentation.

3. Try generating new docs of examples rather than Lua source code, which will

limit the model's tendency to rewrite things from scratch.

4. Unfortunately, the model, prompt, and data are the only vectors you can improve

with the Assistants API or a RAG wrapper. Moving to a custom prompt chain

might give you more flexibility for controlling retrieved context.

5. If you go custom, you can still use a cloud vector store like Pinecone, except now

you'll feed queried results into the prompt. You can play around with the number

of entries you sample or the similarity cutoff (the spatial distance between

vectors) and see if that reduces hallucinations. (If you need some boilerplate

code on interacting/embedding, lmk.)

6. If tuning the number/cutoff doesn't work, try reranking the results. See OpenAI's

example here:

https://cookbook.openai.com/examples/question_answering_using_a_search_api

7. If that doesn't work, try HyDE — a system for generating 'fake' example queries

that might return better results (https://arxiv.org/abs/2212.10496)


8. Try getting the model to generate a task list for complex coding tasks with a few

atomic parts. Tasklist generation is tricky — try adding at least five in-prompt

(multi-shot) examples. Complete each task recursively.

9. Another trick I didn't mention is sampling multiple results from the model at

different temperatures and having the model pick the best one. A surprising

amount of the time, it picks an attempt with a nonzero temperature (which means

it often thinks a different response is optimal than its first attempt.)

10. If all those don't work, try fine-tuning a custom gpt-3.5-turbo-1106 model where

you hand-annotate the desired output. Make sure to combine it with RAG still —

fine-tuning teaches the model to follow your instructions, but it's poor at teaching

new data.

You might also like