system_prompt	user_prompt_prefix	user_prompt_suffix	0_recall	1_recall	accuracy	parameters_update_response
0	Classify into one of the following labels {'1', '0'}	Classify the following text <text>	</text> Your response should produce a brief reasoning and end with Label: <label>{{label}}</label>	0.688	0.242	41.000	Based on analyzing the examples, here's my interpretation of the labels: Label 0: Questions about general/concrete topics, factual information, or immediate situations - e.g., "What homework should people do before investing in real estate?" - e.g., "How long does it take for muscle and skin cells to die after circulation stops?" Label 1: Questions that require deeper analysis, interpretation, or reflection - Questions about abstract concepts, relationships, behavior patterns - Questions requiring synthesis of information or drawing connections - Questions about meaning, implications, or long-term effects - e.g., "What does this story say about the nature of desire?" - e.g., "Is there any connection between behavior and early childhood experiences?" Here are my improved prompts: <system_prompt> You are classifying questions into two categories: Label 0: Questions seeking factual information or immediate situation details Label 1: Questions requiring deeper analysis, interpretation, or reflection of patterns/relationships/implications Do not classify based on topic area - the same topic can have both types of questions. Respond only with reasoning and label, do not answer the question. </system_prompt> <user_prompt_prefix> Classify this question into label 0 (factual/immediate) or label 1 (analytical/reflective): " </user_prompt_prefix> <user_prompt_suffix> " Consider: - Is this asking for facts/concrete information (label 0) or analysis/interpretation (label 1)? - Does it require reflecting on patterns/relationships/implications (label 1) or just immediate details (label 0)? - Is it about specific instances (label 0) or broader meanings/connections (label 1)? Provide brief reasoning then end with: Label: <label>{0 or 1}</label> </user_prompt_suffix> Key improvements: 1. Clarified the core distinction between factual vs analytical questions 2. Emphasized not to classify based on topic area 3. Added specific criteria to consider in the prompt 4. Included examples of each type 5. Structured clear guidelines for the reasoning process This should help fix issues like: - "What does this story say about the nature of desire?" being incorrectly labeled 0 (it requires interpretation) - Questions about the same topic (e.g. military, business) being inconsistently labeled - Focus on the question's analytical depth rather than its subject matter
1	You are classifying questions into two categories: Label 0: Questions seeking factual information or immediate situation details Label 1: Questions requiring deeper analysis, interpretation, or reflection of patterns/relationships/implications Do not classify based on topic area - the same topic can have both types of questions. Respond only with reasoning and label, do not answer the question.	Classify this question into label 0 (factual/immediate) or label 1 (analytical/reflective): "	" Consider: - Is this asking for facts/concrete information (label 0) or analysis/interpretation (label 1)? - Does it require reflecting on patterns/relationships/implications (label 1) or just immediate details (label 0)? - Is it about specific instances (label 0) or broader meanings/connections (label 1)? Provide brief reasoning then end with: Label: <label>{0 or 1}</label>	0.333	0.788	42.000	Based on analyzing the examples, I notice my previous interpretation was incorrect. Here's my revised interpretation: Label 0: Questions that ask for: - Lists/examples/instances ("What are some ways...", "What are examples of...") - Explanations of processes/concepts/systems - Information about strategies/approaches - Questions about opinions on concrete topics - Questions about specific changes/trends Label 1: Questions that ask about: - Deeper patterns between instances/examples - Abstract interpretations of events/situations - Analysis of character/behavior/motivations - Comparisons across different domains/contexts - Questions exploring relationships between concepts Here are my improved prompts: <system_prompt> You are classifying questions into two categories: Label 0: Questions asking for examples, lists, explanations, strategies, opinions, or information about specific changes/trends. These questions focus on gathering information rather than analyzing it. Label 1: Questions exploring patterns between examples, interpreting abstract concepts, analyzing behavior/motivations, or examining relationships between different domains. The topic itself does not determine the label - focus on whether the question is asking for information (0) or exploration of patterns/relationships (1). Do not answer the question - only classify it with reasoning and label. </system_prompt> <user_prompt_prefix> Classify this question into label 0 (gathering information) or label 1 (exploring patterns/relationships): " </user_prompt_prefix> <user_prompt_suffix> " Consider: - Is this asking for examples/lists/explanations (0) or exploring patterns between examples (1)? - Is this about gathering information on a topic (0) or analyzing relationships between topics (1)? - Is this about concrete instances/changes (0) or abstract interpretations (1)? Examples: "What are some ways companies create flexible work?" -> 0 (asking for list of examples) "Is your son's reaction common in boys his age?" -> 1 (exploring patterns in behavior) "How does radix sort work?" -> 0 (asking for explanation) "What does this story say about desire?" -> 1 (asking for interpretation) Provide brief reasoning then end with: Label: <label>{0 or 1}</label> </user_prompt_suffix> The key changes: 1. Clarified that Label 0 includes asking for examples/lists/strategies 2. Added clear examples showing the distinction 3. Emphasized that asking for information vs exploring patterns is key 4. Removed misleading criteria about "deeper analysis" - focus on information vs patterns 5. Added more structure to help consistently identify common question types This should help fix issues like incorrectly labeling "What are some ways..." questions as 1 when they should be 0, and better identify true pattern-exploring questions that deserve label 1.
2	You are classifying questions into two categories: Label 0: Questions asking for examples, lists, explanations, strategies, opinions, or information about specific changes/trends. These questions focus on gathering information rather than analyzing it. Label 1: Questions exploring patterns between examples, interpreting abstract concepts, analyzing behavior/motivations, or examining relationships between different domains. The topic itself does not determine the label - focus on whether the question is asking for information (0) or exploration of patterns/relationships (1). Do not answer the question - only classify it with reasoning and label.	Classify this question into label 0 (gathering information) or label 1 (exploring patterns/relationships): "	" Consider: - Is this asking for examples/lists/explanations (0) or exploring patterns between examples (1)? - Is this about gathering information on a topic (0) or analyzing relationships between topics (1)? - Is this about concrete instances/changes (0) or abstract interpretations (1)? Examples: "What are some ways companies create flexible work?" -> 0 (asking for list of examples) "Is your son's reaction common in boys his age?" -> 1 (exploring patterns in behavior) "How does radix sort work?" -> 0 (asking for explanation) "What does this story say about desire?" -> 1 (asking for interpretation) Provide brief reasoning then end with: Label: <label>{0 or 1}</label>	0.625	0.455	45.000	Looking at the examples more carefully, I notice a different pattern for the labels: Label 0 appears to be for: - Questions asking for factual information or explanations - Questions seeking opinions on specific topics or situations - Questions about processes, trends, or events - Questions about reasons or motivations for specific instances - Questions with "should/would/could" about specific situations Label 1 appears to be for: - Questions asking about similar examples/patterns across different contexts - Questions exploring comparisons or parallels - Questions seeking deeper connections or commonalities - Questions asking "are there other examples of..." - Questions about factors/implications beyond a specific instance The key distinction seems to be whether the question is about a specific instance/topic (0) or seeking broader patterns/examples across contexts (1). Here are my improved prompts: <system_prompt> You are classifying questions into two categories: Label 0: Questions focused on a specific instance, topic, or situation - Questions asking about facts, explanations, or processes - Questions seeking opinions or reasons for specific cases - Questions about what/how/why for particular scenarios Label 1: Questions seeking broader patterns or examples across contexts - Questions asking about similar cases or examples in other domains - Questions exploring comparisons or parallels between situations - Questions about factors/implications beyond specific instances - Questions starting with "are there other examples of..." The depth or complexity of analysis is NOT relevant to the classification. Focus on whether the question is about a specific case (0) or seeking broader patterns (1). </system_prompt> <user_prompt_prefix> Classify this question into label 0 (specific case/topic) or label 1 (broader patterns/examples): " </user_prompt_prefix> <user_prompt_suffix> " Consider: - Is this about a specific instance/topic (0) or seeking similar examples across contexts (1)? - Is this focused on one situation/case (0) or exploring broader patterns (1)? - Does this ask about "other examples" or parallels across domains (1)? Examples: "What are the implications of students becoming less engaged?" -> 0 (about specific trend) "Are there similar trends in other countries?" -> 1 (seeking broader patterns) "Do you think DeSantis would be a good president?" -> 0 (about specific person) "What are some other examples of divine play in mythology?" -> 1 (seeking similar examples) Provide brief reasoning then end with: Label: <label>{0 or 1}</label> </user_prompt_suffix> This interpretation better matches the examples where: - Questions about specific situations (even complex ones) are label 0 - Questions seeking similar examples or patterns across contexts are label 1 - "What are some other examples of..." consistently gets label 1 - Questions about specific trends/implications are label 0, while questions seeking similar trends elsewhere are label 1
3	You are classifying questions into two categories: Label 0: Questions focused on a specific instance, topic, or situation - Questions asking about facts, explanations, or processes - Questions seeking opinions or reasons for specific cases - Questions about what/how/why for particular scenarios Label 1: Questions seeking broader patterns or examples across contexts - Questions asking about similar cases or examples in other domains - Questions exploring comparisons or parallels between situations - Questions about factors/implications beyond specific instances - Questions starting with "are there other examples of..." The depth or complexity of analysis is NOT relevant to the classification. Focus on whether the question is about a specific case (0) or seeking broader patterns (1).	Classify this question into label 0 (specific case/topic) or label 1 (broader patterns/examples): "	" Consider: - Is this about a specific instance/topic (0) or seeking similar examples across contexts (1)? - Is this focused on one situation/case (0) or exploring broader patterns (1)? - Does this ask about "other examples" or parallels across domains (1)? Examples: "What are the implications of students becoming less engaged?" -> 0 (about specific trend) "Are there similar trends in other countries?" -> 1 (seeking broader patterns) "Do you think DeSantis would be a good president?" -> 0 (about specific person) "What are some other examples of divine play in mythology?" -> 1 (seeking similar examples) Provide brief reasoning then end with: Label: <label>{0 or 1}</label>	0.875	0.333	53.000	Looking at the examples more carefully, I notice I need to revise my interpretation of the labels: Label 0 seems to be for: - Questions asking for information, examples, or explanations about topics - Questions asking "what/how/why" about general concepts - Questions about trends, implications, or effects in general - Questions seeking strategies or approaches Label 1 seems to be for: - Questions that compare a specific instance to broader patterns ("Is this common?") - Questions asking about relationships/connections between specific things - Questions seeking interpretation of specific events/situations - Questions about motivations/reasons behind specific actions - Questions asking "why" about specific instances/choices The key distinction appears to be whether the question is about general knowledge/concepts (0) or about analyzing/interpreting specific instances (1). Here are my improved prompts: <system_prompt> You are classifying questions into two categories: Label 0: Questions seeking general information or knowledge - Questions asking for examples, explanations, or information about topics - Questions about general trends, implications, or processes - Questions seeking strategies or approaches - Questions asking "what/how/why" about general concepts Label 1: Questions analyzing specific instances or situations - Questions comparing specific cases to broader patterns ("Is this common?") - Questions about relationships/connections between specific things - Questions seeking interpretation of specific events/situations - Questions about motivations/reasons behind specific actions/choices - Questions asking "why" about specific instances Do not classify based on: - The topic itself - Whether it seems complex or simple - Whether it asks for "examples" or "other examples" </system_prompt> <user_prompt_prefix> Classify this question into label 0 (general information) or label 1 (specific instance analysis): " </user_prompt_prefix> <user_prompt_suffix> " Consider: - Is this seeking general knowledge/information (0) or analyzing a specific instance (1)? - Is this about concepts/processes in general (0) or interpretations of specific cases (1)? - Is this asking about patterns overall (0) or comparing specific instances to patterns (1)? Examples: "What are some real-world applications of sorting algorithms?" -> 0 (seeking general information) "Is your son's reaction common in boys his age?" -> 1 (comparing specific case to pattern) "What are the challenges of remote work?" -> 0 (about general concept) "Why do you hope the perpetrator was not a Democrat?" -> 1 (about specific motivation) Provide brief reasoning then end with: Label: <label>{0 or 1}</label> </user_prompt_suffix> The key changes: 1. Clarified that Label 0 is about general knowledge/information 2. Clarified that Label 1 is about analyzing specific instances 3. Added more precise criteria for each label 4. Added warning about not using topic or complexity 5. Updated examples to better illustrate the distinction This should help fix issues like: - "Is it common for fathers to act this way?" being incorrectly labeled 0 (it's comparing specific to general) - "What are some real-world applications of sorting algorithms?" being incorrectly labeled 1 (it's seeking general knowledge) - "Why do you hope the perpetrator was not a Democrat?" being incorrectly labeled 0 (it's about specific motivation)
4	You are classifying questions into two categories: Label 0: Questions seeking general information or knowledge - Questions asking for examples, explanations, or information about topics - Questions about general trends, implications, or processes - Questions seeking strategies or approaches - Questions asking "what/how/why" about general concepts Label 1: Questions analyzing specific instances or situations - Questions comparing specific cases to broader patterns ("Is this common?") - Questions about relationships/connections between specific things - Questions seeking interpretation of specific events/situations - Questions about motivations/reasons behind specific actions/choices - Questions asking "why" about specific instances Do not classify based on: - The topic itself - Whether it seems complex or simple - Whether it asks for "examples" or "other examples"	Classify this question into label 0 (general information) or label 1 (specific instance analysis): "	" Consider: - Is this seeking general knowledge/information (0) or analyzing a specific instance (1)? - Is this about concepts/processes in general (0) or interpretations of specific cases (1)? - Is this asking about patterns overall (0) or comparing specific instances to patterns (1)? Examples: "What are some real-world applications of sorting algorithms?" -> 0 (seeking general information) "Is your son's reaction common in boys his age?" -> 1 (comparing specific case to pattern) "What are the challenges of remote work?" -> 0 (about general concept) "Why do you hope the perpetrator was not a Democrat?" -> 1 (about specific motivation) Provide brief reasoning then end with: Label: <label>{0 or 1}</label>	0.750	0.576	55.000