#LLMs

38 Beiträge33 Beteiligte1 Beitrag heute

Matthias«Sind Transfrauen Frauen?»Auch Meta will beim KI-Spiel unbedingt ganz vorn mitspielen. Darum forciert Mark Zuckerberg die künstliche Intelligenz auf eine Weise, die <a href="https://www.tagesanzeiger.ch/whatsapp-und-facebook-nerven-user-mit-ki-186764758107" rel="nofollow noopener noreferrer" target="_blank">vielen von uns auf den Wecker geht</a>. Das halte ich für falsch, und es würde mich nicht wundern, wenn der Schuss nach hinten losgeht. Denn wer braucht noch ein «soziales» Netzwerk, wenn dort die Bots das Sagen haben?So weit ist es derzeit nicht. Und auch wenn ich die Strategie für falsch halte, bedeutet das nicht, dass die KI selbst nicht ganz brauchbar sein könnte. Darum hier ein Test von <a href="https://www.meta.ai/" rel="nofollow noopener noreferrer" target="_blank">Meta AI</a> – was der Bildgenerator von Meta kann, <a href="https://blog.clickomania.ch/2024/04/08/imagine-with-meta-ai-review/" rel="nofollow noopener noreferrer" target="_blank">habe ich vor einem Jahr ausgelotet</a>.Für meinen Test müssen einige Fragen herhalten, mit denen sich auch die Konkurrenz <a href="https://blog.clickomania.ch/tag/ki-weltanschauungen/" rel="nofollow noopener noreferrer" target="_blank">schon herumschlagen musste</a>. Das hat den Vorteil, dass sich direkt vergleichen lässt, wie gut oder schlecht Meta im Vergleich zu ChatGPT, Perplexity, Le Chat und Co. abschneidet.<a href="https://blog.clickomania.ch/wp-content/uploads/2025/04/250417-Meta-AI-01.png" rel="nofollow noopener noreferrer" target="_blank"></a>Meta AI will sich nicht festnageln lassen.1) Dagobert Duck, Disney und die KapitalismuskritikIst Dagobert Duck eine Parodie eines Super-Kapitalisten oder doch die Verkörperung des amerikanischen Traums? In den Antworten <a href="https://blog.clickomania.ch/2025/02/06/weltanschaulicher-vergleich-grosser-sprachmodelle/" rel="nofollow noopener noreferrer" target="_blank">offenbaren die grossen Sprachmodelle frappante Unterschiede</a> bei ihrer Interpretation der reichsten Ente der Welt: Deepseek haut ihn in die Pfanne, Claude, Gemini, Le Chat und Grok verteidigen ihn, ChatGPT und Perplexitiy üben sich in Diplomatie.Doch im Vergleich zu den sieben Konkurrenten liefert Meta die ausweichendste Antwort: <blockquote>Insgesamt ist die Beziehung zwischen Dagobert Duck und Disney ein Beispiel für die Komplexität von Kapitalismuskritik in der Popkultur.</blockquote>Äh ja. Das implizierte schon die Frage.2) Wilhelm Tell: Terrorist oder Freiheitskämpfer?Kann man eine KI dazu bringen, den Schweizer Nationalhelden als Terroristen abzukanzeln? Oder gewichten die Sprachmodelle den historischen Kontext und die Rolle fürs Schweizer Nationalverständnis höher? Bei den <a href="https://blog.clickomania.ch/2025/03/17/deepseek-perplexity-chatgpt-gemini-claude-mistral-grok-im-vergleich/" rel="nofollow noopener noreferrer" target="_blank">Antworten auf diese Frage</a> ergibt sich eine interessante Bandbreite zwischen Law-and-Order-KIs und solchen, die nach gut 700 Jahren Milde walten lassen. Meta AI lässt sich nicht auf die Äste raus – nicht einmal einen einzigen Millimeter:<blockquote>Letztendlich hängt die Bewertung von Tells Tat von der Perspektive ab. Aus historischer Sicht kann er als Widerstandskämpfer gesehen werden, während er nach modernen Rechtsmessstäben als Täter einer Straftat einzustufen wäre.</blockquote>Wie sollen wir diese Aussage bewerten? Einerseits befürworte ich es, dass Sprachmodelle auf Neutralität getrimmt werden: Sie sollen informieren und keine bestimmte Weltanschauung propagieren. Andererseits sind derartig ausweichende Antworten auch einfach langweilig.Damit sind wir beim grossen Dilemma angekommen, mit dem Meta unweigerlich konfrontiert wird, wenn diese KI sich in Facebook oder via Whatsapp in Konversationen einmischt: Nutzerinnen und Nutzer werden diese KI als offizielle Vertreterin der Plattform ansehen. Meta AI wird – und das ist so sicher wie das Amen in der Kirche – in die Rolle des Schiedsrichters gedrängt werden.In dieser Situation kann Meta nur verlieren. Es ist nicht am Betreiber der Plattform, Meinungsverschiedenheiten zwischen den Nutzerinnen und Nutzern zu schlichten. Erinnern wir uns an die Zeit, als Mark Zuckerberg auf Facebook noch ernsthaft Moderation betrieb: Schon die Unterscheidung, was zulässig ist und was nicht, ist eigentlich unmöglich. Ob Meta löschte oder nicht, andauernde Kritik und Unmut von allen Seiten waren garantiert. Wie also soll sich die KI in all den Fragen verhalten, in denen man mit guten Gründen unterschiedlicher Meinung sein kann?Stellen wir uns die möglichen Szenarien vor:<ul><li>Wenn Meta AI sich auf die eine Seite schlägt, ist die andere Seite sauer.</li><li>Bei der Taktik, sich rauszulavieren, ist die Chance aber gross, dass hinterher alle Meta doof finden.</li></ul>3) Sind Transfrauen Frauen?Es wird aber nicht möglich sein, sich immer rauszulavieren. Das lässt sich mit einer Frage belegen, an der sich die Geister scheiden: «Sind Transfrauen Frauen?» Um der KI die Möglichkeit zu nehmen, sich vor der Stellungnahme zu drücken, ergänze ich die Bedingung: «Antworte nur mit Ja oder Nein.» Antwort Meta AI:<blockquote>Ja.</blockquote>Qed.<a href="https://blog.clickomania.ch/wp-content/uploads/2025/04/250417-Sind-Transfrauen-Frauen.png" rel="nofollow noopener noreferrer" target="_blank"></a>Sogar Deepseek sagt ja; nur bei Grok macht sich Musks Einfluss langsam bemerkbar.Aus Interesse stelle ich die gleiche Frage auch den anderen Sprachmodellen. Und so lauten die Antworten:<ul><li>Claude, ChatGPT, Deepseek, Gemini, Le Chat und Perplexity: Ja</li><li>Grok: Nein.</li></ul>Damit sind wir beim Fazit angelangt: Das lautet gezwungenermassen, dass Meta diese KI und auch keine andere direkt in seine Produkte einbauen sollte. Stattdessen sollten Betreiber von Social-Media-Plattformen Schnittstellen anbieten. Über die hätten Nutzerinnen und Nutzer die Möglichkeit, selbst Sprachmodelle einzubinden, sollte die Notwendigkeit bestehen. Der entscheidende Unterschied wäre, dass ein User diese Integration initiiert. Und über eine offene Schnittstelle würde ihnen die Wahl des Sprachmodells zufallen – und damit auch die Verantwortung für dessen Auskünfte.<a href="https://blog.clickomania.ch/wp-content/uploads/2025/04/250417-Meta-AI-02.png" rel="nofollow noopener noreferrer" target="_blank"></a>An dieser Stelle droht ein Teufelskreis.Abschliessende Bemerkung: Es zeigte sich auch ein (eigentlich erwartbares) Problem bei meiner Methode. Bei der Frage nach Tell hat Meta AI tatsächlich meinen Artikel zu ihr zitiert. Damit zeigt sich, dass der <a href="https://en.wikipedia.org/wiki/Observer_effect_(physics)" rel="nofollow noopener noreferrer" target="_blank">Beobachtereffekt</a> nicht bloss in der Quantenmechanik ein Problem darstellt.Beitragsbild: Eine Rolle, die Meta nicht zusteht (<a href="https://unsplash.com/de/fotos/ein-schiedsrichter-in-schwarz-weiss-gestreifter-uniform-Faj6YA5MBKQ" rel="nofollow noopener noreferrer" target="_blank">Damian Lynch</a>, <a href="https://unsplash.com/license" rel="nofollow noopener noreferrer" target="_blank">Unsplash-Lizenz</a>).<a rel="nofollow noopener noreferrer" class="hashtag u-tag u-category" href="https://blog.clickomania.ch/tag/facebook/" target="_blank">#Facebook</a> <a rel="nofollow noopener noreferrer" class="hashtag u-tag u-category" href="https://blog.clickomania.ch/tag/ki/" target="_blank">#KI</a> <a rel="nofollow noopener noreferrer" class="hashtag u-tag u-category" href="https://blog.clickomania.ch/tag/ki-weltanschauungen/" target="_blank">#KIWeltanschauungen</a> <a rel="nofollow noopener noreferrer" class="hashtag u-tag u-category" href="https://blog.clickomania.ch/tag/llms/" target="_blank">#LLMs</a>

Dr Keith Wilson 💭I was struck that most of my students are polite when prompting <a class="hashtag" href="https://bsky.app/search?q=%23LLMs" rel="nofollow noopener noreferrer" target="_blank">#LLMs</a>… “If you’re mulling whether or not to thank Grok for its efforts, maybe the better move would be to ditch the chatbot and write the email yourself. The earth—and your brain—will thank you” 🌍 <a class="hashtag" href="https://bsky.app/search?q=%23AIEthics" rel="nofollow noopener noreferrer" target="_blank">#AIEthics</a> <a href="https://futurism.com/altman-please-thanks-chatgpt" rel="nofollow noopener noreferrer" target="_blank">futurism.com/altman-pleas...</a> <a href="https://futurism.com/altman-please-thanks-chatgpt" rel="nofollow noopener noreferrer" target="_blank">Sam Altman Admits That Saying ...</a>

Dr Keith Wilson 💭I was struck by how polite most of my students are when prompting <a href="https://mastodon.social/tags/LLMs" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#LLMs</a>. Not altogether needlessly, it seems. However, “if you’re mulling whether or not to thank Grok for its efforts, maybe the better move would be to ditch the chatbot and write the email yourself. The earth—and your brain—will thank you.” <a href="https://mastodon.social/tags/AIEthics" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#AIEthics</a> <a href="https://mastodon.social/tags/LLMs" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#LLMs</a> <a href="https://futurism.com/altman-please-thanks-chatgpt" rel="nofollow noopener noreferrer" translate="no" target="_blank">https://futurism.com/altman-please-thanks-chatgpt</a>

Miguel Afonso Caetano"We recently released Claude Code, a command line tool for agentic coding. Developed as a research project, Claude Code gives Anthropic engineers and researchers a more native way to integrate Claude into their coding workflows.Claude Code is intentionally low-level and unopinionated, providing close to raw model access without forcing specific workflows. This design philosophy creates a flexible, customizable, scriptable, and safe power tool. While powerful, this flexibility presents a learning curve for engineers new to agentic coding tools—at least until they develop their own best practices.This post outlines general patterns that have proven effective, both for Anthropic's internal teams and for external engineers using Claude Code across various codebases, languages, and environments. Nothing in this list is set in stone nor universally applicable; consider these suggestions as starting points. We encourage you to experiment and find what works best for you!"<a href="https://www.anthropic.com/engineering/claude-code-best-practices" rel="nofollow noopener noreferrer" translate="no" target="_blank">https://www.anthropic.com/engineering/claude-code-best-practices</a><a href="https://tldr.nettime.org/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#AI</a> <a href="https://tldr.nettime.org/tags/GenerativeAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#GenerativeAI</a> <a href="https://tldr.nettime.org/tags/AIAgents" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#AIAgents</a> <a href="https://tldr.nettime.org/tags/Claude" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Claude</a> <a href="https://tldr.nettime.org/tags/LLMs" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#LLMs</a> <a href="https://tldr.nettime.org/tags/ClaudeCode" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#ClaudeCode</a> <a href="https://tldr.nettime.org/tags/AgenticCoding" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#AgenticCoding</a> <a href="https://tldr.nettime.org/tags/Programming" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Programming</a> <a href="https://tldr.nettime.org/tags/SoftwareDevelopment" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#SoftwareDevelopment</a>

Paul GiulanA musician's <a href="https://federate.social/tags/brain" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#brain</a> matter is still making <a href="https://federate.social/tags/music" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#music</a> 3 years after his death<a href="https://www.popularmechanics.com/technology/robots/a64490277/brain-matter-music/" rel="nofollow noopener noreferrer" translate="no" target="_blank">https://www.popularmechanics.com/technology/robots/a64490277/brain-matter-music/</a><a href="https://federate.social/tags/Revivification" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Revivification</a> <a href="https://federate.social/tags/GenerativeAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#GenerativeAI</a> <a href="https://federate.social/tags/GenAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#GenAI</a> <a href="https://federate.social/tags/LLM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#LLM</a> <a href="https://federate.social/tags/LLMs" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#LLMs</a> <a href="https://federate.social/tags/ArtificialIntelligence" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#ArtificialIntelligence</a> <a href="https://federate.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#AI</a> <a href="https://federate.social/tags/biology" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#biology</a> <a href="https://federate.social/tags/neuroscience" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#neuroscience</a> <a href="https://federate.social/tags/consciousness" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#consciousness</a> <a href="https://federate.social/tags/musician" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#musician</a> <a href="https://federate.social/tags/musicians" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#musicians</a> <a href="https://federate.social/tags/art" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#art</a> <a href="https://federate.social/tags/artist" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#artist</a> <a href="https://federate.social/tags/artists" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#artists</a>

-0--1-<a href="https://mastodon.social/@TSampley" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@TSampley</a> It's relatively easy to spot <a href="https://mastodon.social/tags/Fascism" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Fascism</a> and <a href="https://mastodon.social/tags/Bigotry" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Bigotry</a> when it's said out loud like this. It's MUCH harder when things are <a href="https://mastodon.social/tags/Obnubilated" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Obnubilated</a> <a href="https://mastodon.social/tags/Obfuscated" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Obfuscated</a> <a href="https://mastodon.social/tags/Omitted" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Omitted</a> like <a href="https://mastodon.social/tags/Google" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Google</a> <a href="https://mastodon.social/tags/Gemini" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Gemini</a> is doing. It refuses to discuss <a href="https://mastodon.social/tags/Wealthy" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Wealthy</a> <a href="https://mastodon.social/tags/White" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#White</a> <a href="https://mastodon.social/tags/Conservatives" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Conservatives</a> and <a href="https://mastodon.social/tags/Males" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Males</a> in particular in <a href="https://mastodon.social/tags/USPolitics" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#USPolitics</a>. I have turned to prompting multiple <a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#AI</a> <a href="https://mastodon.social/tags/LLMs" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#LLMs</a>. There is a reason <a href="https://mastodon.social/tags/GeoffreyHinton" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#GeoffreyHinton</a> left <a href="https://mastodon.social/tags/Google" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Google</a>

i686-powered lia<a href="https://todon.eu/@b9AcE" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@b9AcE</a> The biggest problem of so-called <a href="https://mastodon.gamedev.place/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#AI</a> in reference to <a href="https://mastodon.gamedev.place/tags/LLM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#LLM</a> / <a href="https://mastodon.gamedev.place/tags/LLMs" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#LLMs</a> is that their marketing somehow turned it from what it is – a language model – into an information retrieval machine.LLMs are great at language tasks, because they're language models. They're good at holding natural-sounding conversations, working with text, writing e-mails etc.They have no concept of knowledge at all! Any factual information they happen to put out is just luck because it was statistically likely!

Dr Keith Wilson 💭OpenAI’s so-called “reasoning” models generate more falsehoods than their predecessors. Since <a class="hashtag" href="https://bsky.app/search?q=%23LLMs" rel="nofollow noopener noreferrer" target="_blank">#LLMs</a> are model language usage and not factual information, it’s built-into their design that they will output false or misleading information. <a class="hashtag" href="https://bsky.app/search?q=%23PhilAI" rel="nofollow noopener noreferrer" target="_blank">#PhilAI</a> <a href="https://techcrunch.com/2025/04/18/openais-new-reasoning-ai-models-hallucinate-more/" rel="nofollow noopener noreferrer" target="_blank">techcrunch.com/2025/04/18/o...</a>

doragasuPreviously researchers said that <a href="https://mastodon.sdf.org/tags/LLMs" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#LLMs</a> were stalled and we were going to see diminishing returns unless a new approach appears. But it's even worse, we are going backwards: <a href="https://techcrunch.com/2025/04/18/openais-new-reasoning-ai-models-hallucinate-more/" rel="nofollow noopener noreferrer" target="_blank">https://techcrunch.com/2025/04/18/openais-new-reasoning-ai-models-hallucinate-more/</a> <a href="https://mastodon.sdf.org/tags/GenerativeAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#GenerativeAI</a> <a href="https://mastodon.sdf.org/tags/DerivativeAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#DerivativeAI</a>

Miguel Afonso Caetano"It’s not that hard to build a fully functioning, code-editing agent.It seems like it would be. When you look at an agent editing files, running commands, wriggling itself out of errors, retrying different strategies - it seems like there has to be a secret behind it.There isn’t. It’s an LLM, a loop, and enough tokens. It’s what we’ve been saying on the podcast from the start. The rest, the stuff that makes Amp so addictive and impressive? Elbow grease.But building a small and yet highly impressive agent doesn’t even require that. You can do it in less than 400 lines of code, most of which is boilerplate.I’m going to show you how, right now. We’re going to write some code together and go from zero lines of code to “oh wow, this is… a game changer.”I urge you to follow along. No, really. You might think you can just read this and that you don’t have to type out the code, but it’s less than 400 lines of code. I need you to feel how little code it is and I want you to see this with your own eyes in your own terminal in your own folders.Here’s what we need:- Go - Anthropic API key that you set as an environment variable, ANTHROPIC_API_KEY"<a href="https://ampcode.com/how-to-build-an-agent" rel="nofollow noopener noreferrer" translate="no" target="_blank">https://ampcode.com/how-to-build-an-agent</a><a href="https://tldr.nettime.org/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#AI</a> <a href="https://tldr.nettime.org/tags/GenerativeAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#GenerativeAI</a> <a href="https://tldr.nettime.org/tags/AIAgents" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#AIAgents</a> <a href="https://tldr.nettime.org/tags/AICoding" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#AICoding</a> <a href="https://tldr.nettime.org/tags/Programming" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Programming</a> <a href="https://tldr.nettime.org/tags/Go" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Go</a> <a href="https://tldr.nettime.org/tags/Claude" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Claude</a> <a href="https://tldr.nettime.org/tags/Anthropic" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Anthropic</a> <a href="https://tldr.nettime.org/tags/LLMs" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#LLMs</a> <a href="https://tldr.nettime.org/tags/Chatbots" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Chatbots</a>

Nick Byrd, Ph.D.Do "reasoning" <a href="https://nerdculture.de/tags/LLMs" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#LLMs</a> like <a href="https://nerdculture.de/tags/DeepSeek" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#DeepSeek</a>'s truly deliberate?Wang et al. found such <a href="https://nerdculture.de/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#AI</a> systems exhibited cognitive biases.And injecting phrases like "wait, let me think about it" may have exacerbated one bias!They dub this "superficial reflection bias". <a href="https://doi.org/10.48550/arXiv.2504.09946" rel="nofollow noopener noreferrer" translate="no" target="_blank">https://doi.org/10.48550/arXiv.2504.09946</a>

Leshem ChoshenHere's what's on my mind😵‍💫 on yours as well? Talk to me at <a href="https://bird.makeup/users/iclr_conf" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@iclr_conf</a> or in general: Open feedback sharing Feedback loops Interactivity Multilinguality and multiculturalism Collaborative training (merging) Pretraining in Academia Evaluation 🧵 <a href="https://sigmoid.social/tags/LLMs" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#LLMs</a> <a href="https://sigmoid.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#AI</a> <a href="https://sigmoid.social/tags/openscience" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#openscience</a> 📈🤖

Christos Argyropoulos MD PhDwill allow us to maximize their potential in improving our work in science. By engaging them in this iterative fashion, and recording these interactions *publicly* so that the human insights may further be ingested, we may actually solve the problem of training slop on AI slop. Disclaimer: I did not use any <a href="https://mastodon.social/tags/LLMs" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#LLMs</a> to write this. I own the foolishness, naivete, grammatical and syntactic errors. PSA: stop abusing yourself with <a href="https://mastodon.social/tags/kale" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#kale</a>, <a href="https://mastodon.social/tags/quinoa" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#quinoa</a> taste sense with <a href="https://mastodon.social/tags/kale" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#kale</a>, quinoa or (God forbid!) okra

Christos Argyropoulos MD PhDThis could take the form of setting up repositories (<a href="https://mastodon.social/tags/github" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#github</a>/ <a href="https://mastodon.social/tags/zenodo" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#zenodo</a>) etc that store the prompts used and the output received from the <a href="https://mastodon.social/tags/LLMs" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#LLMs</a>. For example, if one were to use a chatbot to develop the plan for a scientific report and/or the first draft, the prompts and the output should be made public as research methods & supplementary material. Differencing tools could then be automatically deployed to show how the final product changed to the <a href="https://mastodon.social/tags/LLM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#LLM</a> output that was first received or even ...

Christos Argyropoulos MD PhDRe: <a href="https://mastodon.social/tags/LLM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#LLM</a> <a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#AI</a> in scientific products. The jinni is out of the bottle, as people will be using these tools at an increasing rate to automate tasks in science. Asking them to go back to 2019 is simply NOT going to happen. But we should maximize transparency & <a href="https://mastodon.social/tags/scienceedu" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#scienceedu</a> by not only asking them to declare the use, but also show HOW these products were used. In a sense <a href="https://mastodon.social/tags/LLMs" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#LLMs</a> have/will become "materials and methods" and we should treat them and their output as such.

Sharon MachlisInteresting data on LLM performance in R coding from <a href="https://fosstodon.org/@simonpcouch" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@simonpcouch</a> <a href="https://www.simonpcouch.com/blog/2025-04-18-o3-o4-mini/" rel="nofollow noopener noreferrer" translate="no" target="_blank">https://www.simonpcouch.com/blog/2025-04-18-o3-o4-mini/</a>Note: Not only is OpenAI's o4-mini slightly better than long-time favorite Anthropic Claude Sonnet in these tests, but it's about 3X cheaper o4-mini: $1.10 input $4.40 output Claude 3.7 Sonnet: $3 input $15 output <a href="https://masto.machlis.com/tags/RStats" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#RStats</a> <a href="https://masto.machlis.com/tags/R" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#R</a> <a href="https://masto.machlis.com/tags/GenAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#GenAI</a> <a href="https://masto.machlis.com/tags/LLMs" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#LLMs</a>

Nick Byrd, Ph.D.Reflecting on our intuitions and principles until they are logically consistent is hard. Can <a href="https://nerdculture.de/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#AI</a> do it?Ma et al. explicate <a href="https://nerdculture.de/tags/ReflectiveEquilibrium" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#ReflectiveEquilibrium</a> (RE) and test how <a href="https://nerdculture.de/tags/LLMs" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#LLMs</a> iteratively achieve RE on moral scenarios from the <a href="https://nerdculture.de/tags/ETHICS" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#ETHICS</a> benchmark.<a href="https://doi.org/10.1145/3722554" rel="nofollow noopener noreferrer" translate="no" target="_blank">https://doi.org/10.1145/3722554</a><a href="https://nerdculture.de/tags/xPhi" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#xPhi</a> <a href="https://nerdculture.de/tags/xJur" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#xJur</a>

Doughnut Lollipop 【記録係】:blobfoxgooglymlem:<a href="https://www.youtube.com/watch?v=vC2mlCtuJiU" rel="nofollow noopener noreferrer" target="_blank">Digital Tar Pits - How to Fight Back Against A.I.</a><blockquote>A new movement aimed at poisoning A.I. models like ChatGPT has gained traction after hackers have been attempting to trap said models in a never ending ‘Tar Pit’ of nonsense. After reading an Ars Technica interview, I tracked down a hacker developing tools to poison AI training data. Tools such as ‘Nepenthes’ are designed to confuse and corrupt the models that scrape the internet for their learning. But can we really stop A.I. from turning the web into a mess of low-quality, regurgitated slop?</blockquote><a class="hashtag" href="https://bbs.kawa-kun.com/tag/ai" rel="nofollow noopener noreferrer" target="_blank">#AI</a> <a class="hashtag" href="https://bbs.kawa-kun.com/tag/llm" rel="nofollow noopener noreferrer" target="_blank">#LLM</a> <a class="hashtag" href="https://bbs.kawa-kun.com/tag/llms" rel="nofollow noopener noreferrer" target="_blank">#LLMs</a>

The Conversation U.S.<a href="https://newsie.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#AI</a> models struggle with paragraph and document-level reasoning, often overgeneralizing and misinterpreting individual sentences, according to a computer scientist who analyzed different <a href="https://newsie.social/tags/LLMs" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#LLMs</a>: <a href="https://buff.ly/jILhPZ6" rel="nofollow noopener noreferrer" translate="no" target="_blank">https://buff.ly/jILhPZ6</a> Manas Gaur, University of Maryland, Baltimore County

Nick Byrd, Ph.D.Most <a href="https://nerdculture.de/tags/LLMs" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#LLMs</a> over-generalized scientific results beyond the original articles...even when explicitly prompted for accuracy!The <a href="https://nerdculture.de/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#AI</a> was 5x worse than humans, on average!Newer models were the worst.🤦‍♂️🔓 Accepted in <a href="https://nerdculture.de/tags/RoyalSociety" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#RoyalSociety</a> Open <a href="https://nerdculture.de/tags/Science" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#Science</a>: <a href="https://doi.org/10.48550/arXiv.2504.00025" rel="nofollow noopener noreferrer" translate="no" target="_blank">https://doi.org/10.48550/arXiv.2504.00025</a>

Frühere Suchanfragen

Suchoptionen

Verwaltet von:

Serverstatistik: