Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Image Dialogue Data Collection

Squirrel UI
● You must use Google Chrome as a browser to open Squirrel. Also, please note
that iPhones are not compatible.
● You will be provided with an access link to the tool where you can start
working right away.
● No special computer requirements are needed, only a stable internet
● You may only open the production link on one device at a time.
● Below you can find instructions and screenshots on how to use the tool.
● The Average Handling time is 15 minutes per task.
● Important: Please be very mindful of the time, the task will be
automatically time out after 15 minutes.
1. Determine if the Image is AI-generated or User-Generated; please provide
your own judgment and make your best assumption.
● If the Image is User-Generated, in the conversation you can pretend
that the Image was uploaded by you, in this kind of case you leave the
“AI Generated Image Caption (Long Description)” textbox empty.

● If you can infer the Image is AI-generated, please provide a caption of

what the AI might describe the image in the “AI Generated Image
Caption (Long Description)” textbox.

Note: The caption you provide in the “AI Generated Image Caption (Long
Description)” textbox, must be a longer and more detailed description of the
AI-generated image, please focus on the Key elements of the image. You can check
Example 3 in the AI Generated Image Caption (Long Description) section.

2. Initiate the conversation by creating a User Request in User Request textbox,

and make it as natural as possible. The request should contain information
that can be used as a short caption. Please check the examples below.
● [Scenario 2] If the Image is User-Generated, please add the following
tag <IMAGE> at the end of the conversation. See Example 2 User 1
● You can write 1-3 questions and provide commands, (e.g. “So… could you
draw my wacky dream for me?”, “Draw me a picture of a cat riding a
bicycle to work.”, “I'm considering buying this car. What do you think?”)
3. Follow the conversation with AI Initial Response textbox making the dialogue
between User and AI as natural and realistic as possible.
● [Scenario 1] If the Image is AI-generated, please add the following tag
<IMAGE> at the end of the conversation. See Example 1 and Example 2
in the AI Initial Response Section.
4. Continue the conversation in the User Question 1 textbox and AI Response 1
textbox. Also (it is not mandatory), but If you want to expand even more the
conversation in the User Question 2 textbox and AI Response 2 Textbox, you
are welcome to do so.

Image Dialogue Data Collection 1

Goal 3
Approach 3
Example/Conversation 1 3
Example/Conversation 2 5
Example/Conversation 3 6
Methodology 8
Conversations 8
Format 9
Payment 9
Contact 11
Collect fine-tuning data that encompasses:
● Image generation
● Long image captioning
● Visual question answering
● Complex reasoning over images

Collect fictional conversations, each over a single image, which incorporates
elements of the tasks above. Here are three examples:

Example/Conversation 1
Image: AI Generated Image

AI Generated Image Caption (Long Description):

A delightful and whimsical scene where a cat, anthropomorphized and
dressed in business attire, is riding a bicycle through an urban
setting. The cat, wearing a suit and tie, exhibits a look of focused
determination mixed with joy as it pedals the bicycle, which is
designed in a classic two-wheeled fashion and features a quaint
basket at the front. The surrounding environment is a bustling
cityscape, with modern buildings and a few cars dotting the
background, suggesting a lively city morning. The rising sun in the
backdrop bathes the entire scene in a warm, golden hue, enhancing the
overall cheerful and energetic atmosphere of the picture. This unique
portrayal of a cat commuting to work on a bicycle blends elements of
fantasy with everyday city life, creating a charming and lighthearted
visual narrative.

User Request:
Draw me a picture of a cat riding a bicycle to work.

AI Initial Response:
Here is an image of a cat riding a bicycle to work:

User Question 1:
Nice! Why do you think he’s not wearing a helmet?

AI response 1:
The absence of a helmet in the image of the cat riding a bicycle could be due to a
few reasons. Firstly, it might be an oversight in the artistic interpretation, as the focus
was on creating a whimsical, fun image rather than emphasizing safety measures.
Secondly, depicting animals in human-like scenarios often prioritizes human
characteristics like clothing, while omitting elements that are not typically associated
with the animal's natural form, like a helmet on a cat. Lastly, it could be a deliberate
choice to keep the image simple and focused on the primary theme of a cat
engaging in a human activity. In real-life scenarios, of course, wearing a helmet while
cycling is important for safety.
Example/Conversation 2

Image: User Uploaded Image

AI Generated Image Caption (Long Description): (please keep this empty when you
assume the image is User Uploaded)

User Request:
I'm considering buying this car. What do you think?

AI Initial Response:
The car in the image is a sleek and contemporary SUV, which likely boasts the
reliability and efficiency that Toyota is known for. It would be a solid choice if you're
seeking a vehicle with a good balance of space, comfort, and possibly advanced
technology, especially if it's one of Toyota's hybrid or electric models. You'd benefit
from considering its safety features, fuel economy, and the driving experience it
offers. Before making your decision, however, it's advisable to compare it with other
vehicles in its class, check for consumer reviews, and take it for a test drive to ensure
it meets your specific needs and preferences.

User Question 1:
How can you tell it's an SUV?

AI Response 1:
The vehicle in the image has several distinctive features that suggest it's an SUV
(Sport Utility Vehicle):

* **Tall Stature:** It has a high roofline, providing ample headroom and often
indicating a more spacious interior.
* **Elevated Ride Height:** The ground clearance is higher than what you'd find in
typical sedans or hatchbacks, which is useful for rough terrain or poor road
* **Robust Wheels and Tires:** The wheel design and size are more pronounced,
which can suggest an ability to handle a variety of road conditions.

These visual cues, when taken together, can help differentiate an SUV from other
vehicle types.
Example/Conversation 3

Image: AI Generated Image

AI Generated Image Caption (Long Description): A colorful and imaginative

illustration, a sloth takes center stage as a nail technician in a
vibrant nail salon setting. The sloth, adorned in a charming apron
and sporting tiny, round glasses perched on its nose, is attentively
painting the nails of a relaxed teenager seated across from it. The
salon is a lively scene with an array of sparkly nail polishes
displayed prominently, adding to the whimsical atmosphere.
Interestingly, the sloth is not only focused on the manicure but also
seems to be engaged in giving financial advice, as indicated by
various symbols and graphics related to the stock market that are
playfully integrated into the background. These elements include
graphs, stock tickers, and bond symbols, juxtaposing the world of
finance with the lightheartedness of the nail salon. The overall
effect is a delightful, humorous rendering of a dream where the
realms of beauty and economics blend in an unexpectedly charming way.
User Request:
omg so i had this crazy dream last night 😂 imagine this: a super chill sloth, like
wearing a cute lil apron, is doing my nails right? it's got like these tiny glasses on its
nose, looking all serious and focused, lol. The sloth's got a whole nail salon set up,
with all these sparkly polishes and stuff. and it's so funny cuz while it's painting my
nails, it starts talking about the stock market! like giving me all this advice on
investing, talking about stocks and bonds and all that serious money stuff. it's super
weird cuz the sloth's voice is all calm and slow but also kinda wise, y'know? the whole
thing was just so random and hilarious, i woke up still laughing 😆💅📈
so… could you draw my wacky dream for me?

AI Initial Response:
Here’s an illustration of your funny dream with the sloth giving you nail and stock
market advice: <IMAGE>

User Question 1:
lol whats going on with that dollar polish?

AI Response 1:
In the illustration, the "dollar polish" adds a humorous touch to the scene. It's a nail
polish bottle, but instead of a regular label, it has a dollar sign on it. This whimsical
detail cleverly ties together the themes of nail care and financial advice, reflecting
the unique nature of your dream where a sloth not only does your nails but also talks
about the stock market. It's a playful element that highlights the dream's quirky
blend of a beauty salon and financial wisdom.

User Question 2: How many colors are present in the objects on the table in front of
the sloth?

AI Response 2: In the illustration, there are several objects on the table in front of the
sloth, and they display a variety of colors. I counted in total nine of them, including:
red, orange, yellow, green, teal, light blue, blue, magenta and purple.
We collect images skewed towards aesthetics and manually create fictional
conversations about each image.

We recommend the below methodology since we believe it can be beneficial to
follow when creating the conversations:

1) Look at the image. Observe the fine details.

2) Write a short description of the image, which could be used as a short caption
or introductory prompt response by the AI (e.g. “a picture of a cat riding a
bicycle to work”).
3) A longer, detailed description of the image, describing the key items in the
image is needed for an image that you think it’s generated by the AI Assistant.
In that case, please input the longer description in the textbox of AI Generated
Image Caption (e.g. “A delightful and whimsical scene where a cat,
4) Write 1-3 potential questions about the image. These questions should be
plausible, i.e. something a user is likely to ask. The questions should be diverse
and differ in topic, scope, style, and difficulty across images. The questions
may involve non-trivial reasoning about the image (e.g. “Why do you think he’s
not wearing a helmet?”).
5) Use these textual elements as starting points for composing a realistic
dialogue between a user and an AI assistant about the image, as shown in the
examples. The elements may be edited, rewritten, or discarded. Optimize for a
natural, informative, interesting conversation.

● All “generated” images (i.e. images that appear on the assistant’s side of the
conversation) should contain the tag element: <IMAGE> at the end of the
assistant’s conversation. And provided an “AI Generated Image Caption
(Long Description)”; this Caption should be as informative and grounded as
possible. This teaches the model to “think” about what it wants to generate
before it does.
● For “uploaded” images (i.e. images that appear on the user’s side of the
conversation), There should not be any “AI Generated Image Caption (Long
Description)”, Please just add <IMAGE> tag at the end of the User´s
● All conversations need at least 2 user interactions
(user-assistant-user-assistant). Some dialogues should have more than 2
interactions, without sacrificing the natural flow of conversation.

Your payment is done under your hourly rate on the real-time spent, which is
automatically captured by Squirrel, with a cap of 15 minutes per task. Going for a
break, etc is completely allowed, but please make sure to submit the current tasks
and close the Squirrel page before leaving so that the idle time is not being
captured. This will allow us to get a realistic idea of how much time is being spent
per unit and compared across participants.

Please note we offer a flat rate of $20 per hour.

You can always check your profile information on the Partners Portal website

You can keep an eye on your total job count via the personalized link on Squirrel.
Rule of thumb, as long as you’re completing 3-4 tasks per hour, you are on the
right track. We also highly suggest you track your work time as well.

You can also keep track of your performance by hovering over the Statistics Icon.

Production Hours Calculation:

Again, your payment is done under your hourly rate on the real-time spent, which is
automatically captured by Squirrel, with a cap of 20 minutes per task.

Payment Schedule:
● The production team will create POs at the beginning of the month for the
work completed in the previous month. Once POs are created, the Finance
team will process them and issue payments depending on your supplier
type, please carefully review the following documents to avoid any
payment delay.
● Depending on your supplier type, payments will be released based on the
payment terms.

Documents - Freelancers:
● EVERYONE MUST READ: RWS Moravia Invoicing and Payments Guide.pdf
● Freelancers payment terms: Payment Terms Info_Freelancer.pdf

Documents - Private Individuals:

● EVERYONE MUST READ: RWS Moravia Invoicing and Payments Guide.pdf
● Private Individual payment terms:
Payment Terms Info_Private Individual.pdf
● RWS Moravia Tipalti Guide.pdf

Once you have received the notification about a payment having been created, or
any further invoicing questions, please contact our Accounts Payable team at

If you have any questions regarding these instructions or the project in general,
please get in touch with the project team by sending an email to

You might also like