Skip to main content

Unsorted Datasets

Unsorted Datasets



rss RSS

128
RESULTS


Show sorted alphabetically

Show sorted alphabetically

SHOW DETAILS
up-solid down-solid
eye
Title
Date Archived
Creator
Unsorted Datasets
data

eye 57,122

favorite 13

comment 3

(Here is the original Reddit comment announcing this collection of data and what the processes were.) This is an archive of Reddit comments from October of 2007 until May of 2015 (complete month). This reflects 14 months of work and a lot of API calls. This dataset includes nearly every publicly available Reddit comment. Approximately 350,000 comments out of ~1.65 billion were unavailable due to Reddit API issues. Q: How are the files structured? Each file is compressed with bzip2 compression....
favoritefavoritefavoritefavoritefavorite ( 3 reviews )
Unsorted Datasets
data

eye 51

favorite 0

comment 0

This is a patch/re-dump of the "Ten Billion" text-only 4Chan thread archive , which is a dump of 10.8 million threads/162 million posts posted from 2005-2008 and scraped by an anonymous source (packaged in 2009 and uploaded to archive.org in 2018). The original upload had some issues that prevented it from being fully read. This upload takes the file chanarchive.tar.gz (probably no relation to 4chanarchive/chanarchive) in the original (a tar of MyISAM database files), patches the...
Topics: 4Chan, threads, posts, MySQL, SQL, dump, archive, Ten Billion, old.sage.moe
Unsorted Datasets
by Peter Baylies
software

eye 14,942

favorite 6

comment 0

Deep learning conditional StyleGAN2 model for generating art trained on WikiArt images; includes the model, a ResNet based encoder into the model's latent space, and source code (mirror of the pbaylies/stylegan2 repo on github as of 2020-01-25)
Topics: generative art, StyleGAN2, wikiart, software, deep learning
Unsorted Datasets
by Yannic Kilcher
data

eye 2,449

favorite 10

comment 0

GPT-4chan is a language model fine-tuned from GPT-J 6B on 3.5 years worth of data from 4chan's politically incorrect (/pol/) board, as included in the dataset  Raiders of the Lost Kek: 3.5 Years of Augmented 4chan Posts from the Politically Incorrect Board .
Unsorted Datasets
by NYC Taxi and Limousine Commission
data

eye 20,954

favorite 4

comment 0

FOIA/FOILed Taxi Trip Data from the NYC Taxi and Limousine Commission 2013. Released by http://chriswhong.com/open-data/foil_nyc_taxi/ trip_data.7z and trip_fare.7z are more efficiently compressed versions of the data, you probably want these files. The data is in csv format. For the data files this includes the fields: medallion, hack_license, vendor_id, rate_code, store_and_fwd_flag, pickup_datetime, dropoff_datetime, passenger_count, trip_time_in_secs, trip_distance, pickup_longitude,...
Topics: data, nyc, taxi, fare, csv, FOIA, FOIL
Source: torrent:urn:sha1:6c594866904494b06aae51ad97ec7f985059b135
Unsorted Datasets
image

eye 2,397

favorite 3

comment 0

Dataset used for training   https://archive.org/details/wikiart-stylegan2-conditional-model Upscaled and resized, originally from  https://github.com/cs-chan/ArtGAN/tree/master/WikiArt%20Dataset Note: 1. The WikiArt dataset can be used only for non-commercial research purpose. 2. The images in the WikiArt dataset were obtained from WikiArt.org. The authors are neither responsible for the content nor the meaning of these images. 3. By using the WikiArt dataset, you agree to obey the terms and...
Topics: WikiArt, dataset, paintings, art
Unsorted Datasets
by All the Music, LLC
audio

eye 22,899

favorite 38

comment 10

From: https://www.vice.com/en_uk/article/wxepzw/musicians-algorithmically-generate-every-possible-melody-release-them-to-public-domain : Musicians Algorithmically Generate Every Possible Melody, Release Them to Public Domain Damien Riehl and Noah Rubin generated and saved every possible melody to a hard drive, then turned it back around to the commons. From: https://www.dailymail.co.uk/sciencetech/article-8042979/Musician-uses-computer-algorithm-compose-melody-thats-possible-key-C.html :...
favoritefavoritefavorite ( 10 reviews )
Unsorted Datasets
by Gwern Branwen
data

eye 38,160

favorite 17

comment 1

Dark Net Markets (DNM) are online markets typically hosted as Tor hidden services whose users transact in Bitcoin or other cryptocoins, usually for drugs or other illegal/regulated goods; the most famous DNM was Silk Road 1, which pioneered the business model. From 2013-2015, I scraped/mirrored on a weekly or daily basis all existing English-language DNMs as part of my research into their usage, lifetimes/characteristics, & legal riskiness; in addition, I made or obtained copies of as many...
favoritefavoritefavoritefavoritefavorite ( 1 reviews )
Topics: Tor, Bitcoin, drugs, Silk Road, Evolution, Agora, black-markets, dark net markets
Unsorted Datasets
by 4chan
data

eye 1,085

favorite 11

comment 0

The Ark is a compilation of material created by 4chan's /k/ board. Being a board centered around weapons, it includes information about warfare, survival, gunsmithing, etc but also stuff more generally aligned with their interests like anime and games. From the torrent's...
Topics: 4chan, /k/, /k/ommando, weapons, guns, gunsmithing, survival, tactics, game, games, anime, 3d print
Source: torrent:urn:sha1:6d72a0d13d050f6ed00179ffd4294b549714140a
Unsorted Datasets
data

eye 1,503

favorite 1

comment 0

This collection of textfiles are logs from the 2009 era regarding the development, testing, and reaction of the earliest version of Minecraft , the building and survival game created by Markus "Notch" Persson and released as a product in 2011. They are saved captures of discussions on the #minecraft channel regarding all manner of aspects of Minecraft testing and development. Minecraft (and the company owning it, Mojang) was sold to Microsoft for $2.5 billion in 2014. This collection...
Unsorted Datasets
by legacycollector.org
software

eye 9,537

favorite 9

comment 2

To Browse the Repository: Click Here This website is a repository for web content that has been deemed "legacy" and has been removed by their original publishers, and might otherwise be difficult or cumbersome to get. Since starting this, end 2018, in response to Mozilla removing all legacy extensions from its add-ons site, with plans to expand to include more, similar "legacy" content, a few things have changed needing me to re-evaluate both the need for this site and my...
favoritefavoritefavoritefavoritefavorite ( 2 reviews )
Unsorted Datasets
software

eye 1,060

favorite 4

comment 1

Apple Developer Discs 1989 2009
favoritefavoritefavoritefavoritefavorite ( 1 reviews )
Unsorted Datasets
by Internet Archive
data

eye 24,171

favorite 12

comment 1

Culled from various sources, this collection includes over one million JPG, PNG and GIF album covers. The resolution ranges from "thumbnail" through to very large sizes. Filenames are variant in usefulness, although a good number indicate at least the name of the original album. This dataset is for experimentation and image processing research only. At 148gb, the collection is large but not unmanageable (there is a torrent available) and allows a developer or artist to work with the...
favoritefavoritefavoritefavoritefavorite ( 1 reviews )
Topics: dataset, big data, album covers, covers, cover art, cover photos
Unsorted Datasets
by Peter Baylies
software

eye 1,024

favorite 1

comment 0

Real ESRGAN upscaling models fine-tuned on paintings.
Topics: Real ESRGAN, GAN, upscaling, paintings, super-resolution
Unsorted Datasets
software

eye 1,994

favorite 0

comment 0

Syzygy endgame tablebases containing win-draw-loss (WDL) and distance-to-zero (DTZ) information for chess positions containing 7 pieces. These tablebases will be of interest to both chess players and computer scientists. For more information, please visit the Chess Programming wiki: https://www.chessprogramming.org/Syzygy_Bases This is Part 1 of ?. See below for the remaining parts: Part 2: https://archive.org/details/Syzygy7_2
Topics: 7-man, 7-men, 7man, 7men, chess, database, databases, egtb, egtbs, syzygy, tablebase, tablebases
Source: http://tablebase.sesse.net/
Unsorted Datasets
by Andrew Hundt
movies

eye 496

favorite 2

comment 0

Stack blocks like a champion! The CoSTAR Block Stacking Dataset includes a real robot trying to stack colored children's blocks more than 10,000 times in a scene with challenging lighting and a movable bin obstacle which must be avoided. This dataset is especially well suited to the benchmarking and comparison of deep learning algorithms. Visit the CoSTAR Dataset Website for more info. If you use the dataset, please cite our paper introducing it: Training Frankenstein's Creature to Stack:...
Unsorted Datasets
by Yannic Kilcher
software

eye 702

favorite 2

comment 0

GPT-4chan is a language model fine-tuned from GPT-J 6B on 3.5 years worth of data from 4chan's politically incorrect (/pol/) board, as included in the dataset  Raiders of the Lost Kek: 3.5 Years of Augmented 4chan Posts from the Politically Incorrect Board .
Topic: GPT 4chan pol AI
Unsorted Datasets
data

eye 2,187

favorite 9

comment 0

A collection of fanfiction stories from fanfiction.net, repacked for easier bulk collecting and archiving. Contains many tens of thousands of fan fiction stories.
Unsorted Datasets
data

eye 264

favorite 3

comment 0

Biggest Wordlist Collection
Topic: hacking
Unsorted Datasets
by Various
software

eye 1,478

favorite 8

comment 1

66,000 .SWF files, banner ads put into websites in the 2003-2004 era of the Web. Requires a flash player to view. The files have been saved with simple numbers, so no obvious metadata exists. The ads themselves range across a wide variety of products, services, companies and public service, with the .SWF file being self-encapsulated (not requiring any servers or outside data, although some have active URL clickthroughs to sites likely all dead).  Files have been separated by month released...
favoritefavoritefavoritefavoritefavorite ( 1 reviews )
Unsorted Datasets
by Ben
data

eye 2,910

favorite 1

comment 0

Ben's FTP List (May, 2018): This is a trimmed down list of all servers that are online and allow anonymous connections. There are 244441 FTP's in total Please note: It is unknown if these servers are online after the scan or are behind dynamic IP addresses, making it impossible to guarantee if they are available after this list was compiled. This census is provided as a series of bzip2 files, which can be read directly by utilities such as zmore and zless. It is both intended to be used for...
Unsorted Datasets
by SilenceROM
software

eye 940,894

favorite 4

comment 1

SilenceROM LIII Changelog *CCM/Hybrid/Nox Adjustments *Tweaked Super Favourites *Updated source file *Updated applications *Updated SilenceROM Wizard *Tweaked Database +TorrentRelease Repo +Renegades TV Guide :Preconfigured +Dragon Streams +DubStop ####################### SilenceROM LII Changelog *SilenceROM now can be installed via Wizard ! I made the wizard from whufclee's original code. Thanks whufclee! ! Benefit of installing via wizard; preconfigured system settings. ! This is not possible...
favoritefavoritefavoritefavoritefavorite ( 1 reviews )
Topics: SilenceROM, Community Build, Kodi, Helix, CCM, Hybrid, Speed, Stability, Live TV, Sports, Movies,...
Unsorted Datasets
by NintendoWizard22
software

eye 312

favorite 0

comment 0

Name: 1x_Dehalo_Shout_Factory_SMBSS_G.pth License: CC BY-NC-SA 4.0 Model Architecture: ESRGAN Scale: 1 Purpose: reduce the oversharpened edges present in the Super Mario Bros. Super Show DVD released by Shout Factory. Iterations: 10000 batch_size: 1 HR_size: 128 Epoch: 3 Dataset: Shout Factory the end credits. Dataset_size: 2,383 OTF Training: No Pretrained_Model_G: 1xESRGAN.pth Description: When Shout Factory released the Super Mario Bros. Super Show! to DVD the felt the need to sharpen the...
Topics: The, Super, Mario, Bros, Show, ESRGAN, model, Dehalo, Shout, Factory, DVD
Unsorted Datasets
data

eye 51

favorite 1

comment 0

Unsorted Datasets
by Daniel Grahn
data

eye 167

favorite 0

comment 0

Wild C is a dataset of C/C++ source code and tokens collected from GitHub. The dataset is licensed under CC-BY-SA-4.0, the individual files are subject to their own licenses. For more details on collection procedures and usage, see https://github.com/mla-vd/wild-c .
Topics: dataset, source code, tokens, machine learning
Unsorted Datasets
data

eye 4,120

favorite 6

comment 0

Large sets of malware examples for the purposes of research, comparison, and history. This is the Various set, which is a volume of specific smaller sets of malware.
Unsorted Datasets
by Nikolaos Aletras and Ilias Chalkidis
texts

eye 685

favorite 0

comment 1

This dataset is used for the experiments described in the following paper: I. Chalkidis, I. Androutsopoulos and N. Aletras, "Neural Legal Judgment Prediction in English". Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019), Florence, Italy, (short papers), 2019. The initial data have been scraped and are publicly available under https://hudoc.echr.coe.int. 
( 1 reviews )
Topics: dataset, nlp, echr
Unsorted Datasets
data

eye 4,819

favorite 3

comment 0

I took the Reddit comment archive and converted all the JSON into one SQLite database using this program that I wrote: https://gist.github.com/ers35/3b615a75fa0ed5e6d5cc I ran a few tests to make sure the number of database rows matches the number of JSON records. "SELECT MAX(rowid) FROM comment" and "SELECT COUNT(id) FROM comment" both return 1659361605. This gives me some confidence as to the integrity of the dataset, but I cannot be 100% sure. The compressed size is 163G....
Unsorted Datasets
software

eye 3,739

favorite 1

comment 0

Courtesy of Chris Fenton, a research forensic recording of an 80 megabyte CDC-9877 disk pack. From Fenton: "It is a 'magnetic image' of an 80 megabyte CDC-9877 disk pack that might potentially contain some Cray-1 system software (it might also be blank), of which no known copies currently still exist. I managed to acquire a disk drive from the 1970's that could accept one of these disks, but none of the control electronics worked anymore. I built a robot that manually steps the read heads...
Topics: Cray, Disk Image
Unsorted Datasets
software

eye 2,413

favorite 2

comment 0

Large sets of malware examples for the purposes of research, comparison, and history. This is the alphabetical set.
Unsorted Datasets
by Anonymous
data

eye 482

favorite 2

comment 0

An archive of HTML versions of Slashdot stories from the entire history of the site (Slashdot.org). The program that generated the HTML files is included in this archive.
Unsorted Datasets
by Ilias Chalkidis et al. (2021)
data

eye 172

favorite 0

comment 0

This resource includes the RegIR datasets, accompanying the article: Regulatory Compliance through Doc2Doc Information Retrieval: A case study in EU/UK legislation where text similarity has limitations. Ilias Chalkidis, Manos Fergadiotis, Nikolaos Manginas and Prodromos Malakasiotis. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. (Held online due to COVID-19). 2021.
Topics: information retrieval, nlp
Unsorted Datasets
by Eugene Nalimov
software

eye 183

favorite 0

comment 0

Nalimov endgame tablebases containing distance-to-mate (DTM) information for chess positions with 6 pieces (4 vs. 2 with pawns) remaining. These tablebases will be of interest to both chess players and computer scientists. Files graciously made available by HARDCORE COMPUTER CHESS™: https://computer-chess.azurewebsites.net/ For more information, please visit the Chess Programming wiki: https://www.chessprogramming.org/Nalimov_Tablebases
Topics: 6-man, 6-men, 6man, 6men, chess, database, databases, egtb, egtbs, nalimov, tablebase, tablebases
Source: https://computer-chess.azurewebsites.net/egtb-torrents/
Unsorted Datasets
data

eye 770

favorite 4

comment 1

Large collection of Minecraft modifications. files directory is in a files.zip ZIP file for ease of transfer, but should be unpacked when being used.
favoritefavoritefavoritefavorite ( 1 reviews )
Unsorted Datasets
data

eye 3

favorite 0

comment 0

This is a metadata release of Danbooru, covering posts from 2005-05-24 to 2021-12-31 (final ID: #5,020,995), based on Gwern's efforts to provide a danbooru dataset . The files metadata.2017.7z to metadata.2021.7z correspond to Gwern's Danbooru2017–Danbooru2021 releases, respectively. However, the data structure is primarily tailored for archival/historical purposes. Each JSONL entry corresponds to one post and all entries are separated by year and sorted by ID. The JSONL format is kept...
Topics: danbooru, imageboard, booru, metadata, dataset, json, jsonl
Unsorted Datasets
software

eye 1,157

favorite 2

comment 0

SkyTorrents dump, detailed here: https://torrentfreak.com/skytorrents-dumps-massive-torrent-database-and-shuts-down180221/
Topics: SkyTorrents, sky, torrents, reddit, cache, database, adfree, no, ads, torrentfreak, torrent,...
dataset
Topic: dataset
Source: torrent:urn:sha1:325fc900c2c7bb7a0cfcfd45851a65c2f5b5391d
Unsorted Datasets
by Seattle FilmWorks
software

eye 152

favorite 0

comment 0

This is the software included on Seattle FilmWorks picture disks (3.5'' in my case).  The software itself sat on the root of the disk, and there was a folder with the order number.rol. Pictures on the disk had a .SFW extension. The software runs in DOS.
Topics: archiveteam, seattle filmworks, floppy, pictures, software
Unsorted Datasets
by Richard Patel
data

eye 340

favorite 2

comment 0

Source Website ( Archive link ) YT_COMMENTS_TERORIE_2019_10.ndjson.zst : The website says it is CSV. Although the extension is ndjson, the creator of the file has said this is incorrect . It is compressed using Zstandard , and decompresses to 2.1 TB. On Mac and Linux you can install zstd , on Windows I'd suggest installing 7zip and then installing this plugin . Details of the organization of the file, crawl time, etc, can be found on the site. YT_COMMENTS_TERORIE_AUTHOR_IDS.txt : "The...
Topics: youtube, youtube comments, comment, comments, 10 billion, channel, channel names, youtube channel,...
Source: torrent:urn:sha1:18bc22ee0017fb056794f3d7821a942b5c08cc91
Unsorted Datasets
data

eye 2

favorite 0

comment 0

This is a metadata release of Gelbooru, covering posts from 2007-07-16 through 2021-12-31 (final ID: #6,790,764), inspired by Gwern's efforts to provide a danbooru dataset . The JSONL format is as similar as possible to the format used for Gwern's Danbooru2017–Danbooru2020 projects and the data structure is primarily tailored for archival/historical purposes. Each JSONL entry corresponds to one post and all entries are separated by year and sorted by ID. Because Gelbooru automatically...
Topics: gelbooru, imageboard, booru, metadata, dataset, json, jsonl
Unsorted Datasets
software

eye 14

favorite 0

comment 0

Apache Superset is a Data Visualization and Data Exploration Platform Superset A modern, enterprise-ready business intelligence web application. Why Superset? | Supported Databases | Installation and Configuration | Release Notes | Get Involved | Contributor Guide | Resources | Organizations Using Superset Why Superset? Superset is a modern data exploration and data visualization platform. Superset can replace or augment proprietary business intelligence tools for many teams. Superset...
Topics: GitHub, code, software, git
Unsorted Datasets
software

eye 214

favorite 0

comment 0

This is a collection of API Scrapes from Jamendo in 2009. See http://developer.jamendo.com/en/wiki/Musiclist2Api for more information and documentation. This item contains the results of http://api.jamendo.com/get2/id+rating/album/plain/
Topics: jamendo, music, database, community, scrape, api
Unsorted Datasets
software

eye 49

favorite 0

comment 0

MacIIcx Hard Drive Cache
Unsorted Datasets
by All WWDIY content creators
software

eye 28

favorite 2

comment 0

Topics: WarioWare, wario, microgames, records, comics, user, usa, japan, games, wifi, nintendo, doujinsoft,...
Unsorted Datasets
by Convergent Technologies
data

eye 208

favorite 0

comment 0

Raw media and additional support photographs for the Convergent MightyFrame - primarily CTIX S-120-22x-320
Topics: CTIX, Convergent Technologies, Mightyframe, 5.25.1, S120, S22X, S320, 71-03195-01
Unsorted Datasets
software

eye 1,752

favorite 2

comment 0

All the "journal article" DOIs from CrossRef's OAI-PMH server; URLs of just under 50 million journal articles.
Topics: doi, dataset
Unsorted Datasets
by swebb
data

eye 83

favorite 0

comment 0

Small soundcloud snapshot by swebb
Unsorted Datasets
by William W. Cohen, MLD, CMU
web

eye 799

favorite 2

comment 0

This dataset was collected and prepared by the CALO Project (A Cognitive Assistant that Learns and Organizes). It contains data from about 150 users, mostly senior management of Enron, organized into folders. The corpus contains a total of about 0.5M messages. This data was originally made public, and posted to the web , by the Federal Energy Regulatory Commission during its investigation. The email dataset was later purchased by Leslie Kaelbling at MIT, and turned out to have a number of...
Topics: Enron, E-mail, Dataset
Unsorted Datasets
software

eye 979

favorite 0

comment 0

Syzygy endgame tablebases containing win-draw-loss (WDL) and distance-to-zero (DTZ) information for chess positions containing 7 pieces. These tablebases will be of interest to both chess players and computer scientists. For more information, please visit the Chess Programming wiki: https://www.chessprogramming.org/Syzygy_Bases This is Part 2 of ?. See below for the remaining parts: Part 1: https://archive.org/details/Syzygy7
Topics: 7-man, 7-men, 7man, 7men, chess, database, databases, egtb, egtbs, syzygy, tablebase, tablebases
Source: http://tablebase.sesse.net/
Unsorted Datasets
data

eye 269

favorite 1

comment 1

Teletext Compilation Collection 2020 07
favoritefavoritefavoritefavoritefavorite ( 1 reviews )
Unsorted Datasets
movies

eye 761

favorite 0

comment 0

Unsorted Datasets
data

eye 14

favorite 0

comment 0

Unsorted Datasets
data

eye 1,434

favorite 0

comment 0

Database of UPC product codes, as compiled by upcdatabase.com
Topics: UPC, Universal Product Code, barcode
Unsorted Datasets
data

eye 82

favorite 2

comment 0

Unsorted Datasets
software

eye 179

favorite 0

comment 0

This is a collection of API Scrapes from Jamendo in 2012. See http://developer.jamendo.com/en/wiki/Musiclist2Api for more information and documentation. This item contains the results of http://api.jamendo.com/get2/id+rating/album/plain/
Topics: jamendo, music, database, community, scrape, api
Unsorted Datasets
by Puzer
software

eye 27

favorite 0

comment 0

Latent training data for FFHQ StyleGAN, originally hosted at  https://drive.google.com/uc?id=1xMM3AFq0r014IIhBLiMCjKJJvbhLUQ9t by Puzer - see  https://github.com/Puzer/stylegan-encoder/blob/master/Learn_direction_in_latent_space.ipynb for more details.
Topics: FFHQ, StyleGAN, stylegan-encoder, Puzer, training data
Unsorted Datasets
software

eye 409

favorite 1

comment 0

This is a collection of database dumps downloaded from Jamendo in 2011. See http://developer.jamendo.com/en/wiki/NewDatabaseDumps for more information and documentation.
Topics: jamendo, music, database, dbdump
Unsorted Datasets
by Stuck_In_the_Matrix
data

eye 680

favorite 2

comment 0

Dataset published and compiled by /u/Stuck_In_the_Matrix , in r/datasets . The dataset is ~1.7 billion JSON objects complete with the comment, score, author, subreddit, position in comment tree and other fields that are available through Reddit's API. I'm currently doing NLP analysis and also putting the entire dataset into a large searchable database using Sphinxsearch (also testing ElasticSearch). This dataset is over 1 terabyte uncompressed, so this would be best for larger research...
Topics: reddit, datasets, comments, bigquery, Stuck_In_the_Matrix
Source: torrent:urn:sha1:7690f71ea949b868080401c749e878f98de34d3d
Unsorted Datasets
software

eye 746

favorite 1

comment 0

This directory contains all the spam that I have received since early 1998. I have employed various "bait" addresses, such as to trick email address harvesters into putting them on spam lists. The archives have been (re)compressed with p7zip which produces files about half the size of tar+bzip2, and smaller even than I was able to achieve with RAR on Linux. This archive is provided for the purposes of researching behavior of spammers and development of new spam management techniques....
Unsorted Datasets
software

eye 36

favorite 0

comment 0

This is a collection of API Scrapes from Jamendo in 2013. See http://developer.jamendo.com/en/wiki/Musiclist2Api for more information and documentation. This item contains the results of http://api.jamendo.com/get2/id+rating/album/plain/
Topics: jamendo, music, database, community, scrape, api
Unsorted Datasets
data

eye 8

favorite 0

comment 0

Unsorted Datasets
data

eye 10

favorite 0

comment 0

Unsorted Datasets
by Anurag Peddi
software

eye 5

favorite 0

comment 0

It is a dataset which can be used in building a model which can be used for pen ink differentiation! It consists of 91 images written by three users, each writing with 10 pens and 3 words using each pen.
Topics: pen, ink, differentiation, dataset, deep, learning, classification
Unsorted Datasets
by Edward Meechum
software

eye 31

favorite 0

comment 0

Youtube changed it's search & ranking algorithm a few weeks ago to neglect content that challenges the official 9/11 conspiracy theory. In a free and open society it is indispensable to get all available information, especially when something is very wrong. A nerdy contemporary created a searchable dataset just before yt's algorithm changed. What do I need? - 7-ZIP ( www.7-zip.org ) to extract - 42 gigabytes of free hdd - 2,4 gigabytes of free ram - Java 8 runtime Just extract the...
Topic: 9/11
Unsorted Datasets
software

eye 11

favorite 0

comment 0

This is the data set associated with the "Deep Green: modelling time-series of software energy consumption" paper published in ICSME 2017. The abstract of the paper -- "Inefficient mobile software kills battery life. Yet, developers lack the tools necessary to detect and solve energy bugs in software. In addition, developers are usually tasked with the creation of software features and triaging existing bugs. This means that most developers do not have the time or resources to...
Topics: green mining, android, artifacts, ICSME 2017, software engineering
Unsorted Datasets
by Eugene Nalimov
software

eye 107

favorite 0

comment 0

Nalimov endgame tablebases containing distance-to-mate (DTM) information for chess positions with 6 pieces (3 vs. 3 with pawns) remaining. These tablebases will be of interest to both chess players and computer scientists. Files graciously made available by HARDCORE COMPUTER CHESS™: https://computer-chess.azurewebsites.net/ For more information, please visit the Chess Programming wiki: https://www.chessprogramming.org/Nalimov_Tablebases
Topics: 6-man, 6-men, 6man, 6men, chess, database, databases, egtb, egtbs, nalimov, tablebase, tablebases
Source: https://computer-chess.azurewebsites.net/egtb-torrents/
Unsorted Datasets
data

eye 15

favorite 0

comment 0

Unsorted Datasets
software

eye 1,005

favorite 2

comment 0

This is a collection of database dumps downloaded from Jamendo in 2012. See http://developer.jamendo.com/en/wiki/NewDatabaseDumps for more information and documentation.
Topics: jamendo, music, database, dbdump
Unsorted Datasets
data

eye 11

favorite 1

comment 0

A collection of bundled teletext data streams from the 2010s.
Topics: teletext, tta, t42
Unsorted Datasets
data

eye 13

favorite 0

comment 0

Dataset of EleutherAI
Topic: AI, EleutherAI
Source: torrent:urn:sha1:0d366035664fdf51cfbe9f733953ba325776e667
This is a collection of API Scrapes from Jamendo in 2011. See http://developer.jamendo.com/en/wiki/Musiclist2Api for more information and documentation. This item contains the results of http://api.jamendo.com/get2/id+album_id+name+text+rating+lang+dates/review/jsonpretty/ (until April 2012 it was /plain/)
Topics: jamendo, music, database, community, scrape, api
Unsorted Datasets
software

eye 260

favorite 0

comment 0

This is a collection of API Scrapes from Jamendo in 2011. See http://developer.jamendo.com/en/wiki/Musiclist2Api for more information and documentation. This item contains the results of http://api.jamendo.com/get2/id+tag_idstr/album/plain/
Topics: jamendo, music, database, community, scrape, api
Unsorted Datasets
by voh
data

eye 75

favorite 0

comment 0

Unsorted Datasets
software

eye 103

favorite 0

comment 0

SNDH Atari ST YM2149 Archive
Unsorted Datasets
software

eye 824

favorite 2

comment 0

This is a collection of database dumps downloaded from Jamendo in 2010. See http://developer.jamendo.com/en/wiki/NewDatabaseDumps for more information and documentation.
Topics: jamendo, music, database, dbdump