{"id":30773,"date":"2025-04-18T17:29:11","date_gmt":"2025-04-18T17:29:11","guid":{"rendered":"https:\/\/www.europesays.com\/uk\/30773\/"},"modified":"2025-04-18T17:29:11","modified_gmt":"2025-04-18T17:29:11","slug":"wikipedia-is-giving-ai-developers-its-data-to-fend-off-bot-scrapers","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/uk\/30773\/","title":{"rendered":"Wikipedia is giving AI developers its data to fend off bot scrapers"},"content":{"rendered":"<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1ymtmqpi _17nnmdy1 _17nnmdy0 _1xwtict1\">Wikimedia says the dataset hosted by Kaggle has been \u201cdesigned with machine learning workflows in mind,\u201d making it easier for AI developers to access machine-readable article data for modeling, fine-tuning, benchmarking, alignment, and analysis. The content within the dataset is openly licensed, and as of April 15th, includes research summaries, short descriptions, image links, infobox data, and article sections \u2014 minus references or non-written elements like audio files.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1ymtmqpi _17nnmdy1 _17nnmdy0 _1xwtict1\">\u201cAs the place the machine learning community comes for tools and tests, Kaggle is extremely excited to be the host for the Wikimedia Foundation\u2019s data,\u201d said Kaggle partnerships lead Brenda Flynn. \u201cKaggle is excited to play a role in keeping this data accessible, available, and useful.\u201d<\/p>\n","protected":false},"excerpt":{"rendered":"Wikimedia says the dataset hosted by Kaggle has been \u201cdesigned with machine learning workflows in mind,\u201d making it&hellip;\n","protected":false},"author":2,"featured_media":30774,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3163],"tags":[323,1942,12,326,53,16,15,4715],"class_list":{"0":"post-30773","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-news","11":"tag-tech","12":"tag-technology","13":"tag-uk","14":"tag-united-kingdom","15":"tag-web"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@uk\/114360153769038872","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/30773","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/comments?post=30773"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/30773\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media\/30774"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media?parent=30773"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/categories?post=30773"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/tags?post=30773"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}