Search:
Go
English
Deutsch
Français
Japanese
Chinese Simplified
Chinese Traditional
Korean
Russian
Arabic
Česky
Greek
Italiano
Afrikaans
Aragonés
Armenian
Asturianu
Azerbaijani
Bahasa Indonesia
Bahasa Melayu
Bangla
Bashkir
Belarusian
Bosanski
Brezhoneg
Bulgarian
Català
Cymraeg
Dansk
Eesti
Español
Esperanto
Euskara
Frysk
Furlan
Føroyskt
Gaeilge
Gàidhlig
Galego
Gujarati
Hebrew
Hindi
Hrvatski
Interlingua
Íslenska
Kannada
Kaszëbsczi
Kazakh
Kiswahili
Kurdî
Kyrgyz
Latviski
Lëtzebuergesch
Lietuvių
Lingua Latina
Magyar
Makedonski
Marathi
Nederlands
Nordfriisk
Norsk
O'zbekcha
Occitan
Ossetian
Persian
Polski
Português
Punjabi Gurmukhi
Română
Rumantsch
Sardu
Seeltersk
Shqip
Sicilianu
Sinhala
Slovensko
Slovensky
Srpski
Suomi
Svenska
Tagalog
Taiwanese
Tamil
Tatarça
Telugu
Thai
Tiếng Việt
Türkçe
Türkmençe
Ukrainian
Urdu
Uyghurche
DMOZ Internet Directory
Presented by
DMOZLive.com
Home
About
Submit Site
Tweet
Home
Computers
Artificial Intelligence
Machine Learning
Datasets
25 Sites
Repositories of data used to test/validate machine learning algorithms.
Sites
[ Submit ]
Data Hunters
- Data Hunters is an online community for data seekers, analysts, scientists, and business professionals.
The RCSB Protein Data Bank (PDB)
- Archive of experimentally-determined, biological macromolecule 3-D structures from the Brookhaven National Laboratory.
DELVE - Data for Evaluating Learning in Valid Experiments
- A standardized environment designed to evaluate the performance of methods that learn relationships based primarily on empirical data. Delve makes it possible for users to compare their learning methods with other methods on many datasets.
UCI Machine Learning Repository
- A repository of databases, domain theories and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms.
TREC Data
- Text datasets used in information retrieval and learning in text domains.
National Space Science Data Center
- Provides access to a wide variety of astrophysics, space physics, solar physics, lunar and planetary data from NASA space flight missions, in addition to selected other data and some models and software.
Face recognition dataset
- A dataset of face images for face recognition algorithms.
HS3D - Homo Sapiens Splice Sites Dataset
- A database of Homo Sapiens Exon, Intron and Splice regions extracted from GenBank primate sequences Rel.123. The aim of this data set is to give standardized material to train and to assess the prediction accuracy of computational approaches for gene identification and characterization.
Learning Relational Concepts from Sensor Data of a Mobile Robot
- A set of data sets, where each data set is represented in first order logic. Maintained at the University of Dortmund, Germany.
Web->KB dataset
- Web pages partitioned into classes, with hyperlink data. The dataset has been used for text categorization and learning to extract symbolic knowledge from the World Wide Web.
WordSimilarity-353 Test Collection
- Contains 353 English word pairs along with human-assigned similarity judgements.
RISE: Repository of Information Sources used in information Extraction tasks.
- Repository of online information sources: test domains for information extraction and wrapper generation tools that learn extraction rules (extraction patterns).
Reuters-21578 Text Categorization Corpus
- A classic benchmark for text categorization algorithms.
Bilkent University Function Approximation Repository
- Datasets used for the experimental analysis of function approximation techniques and for training and demonstration by machine learning and statistics community.
TechTC - Technion Repository of Text Categorization Datasets
- Provides a large number of diverse test collections for use in text categorization research.
ArrayExpress - functional genomics data
- ArrayExpress is a database of functional genomics experiments that can be queried and the data downloaded. It includes gene expression data from microarray and high throughput sequencing studies.
The 20 Newsgroups Data Set
- 20 Newsgroups for text categorization. Widely used dataset.
University of Maryland, INFORUM EconData
- Several hundred thousand economic time series, produced by the U.S. Government and distributed by the government in a variety of formats and media, have been put into a standard, highly efficient, easy-to- use form for personal computers.
NIST Special Database 4.
- This NIST database of fingerprint images contains 2000 8- bit gray scale fingerprint image pairs. NIST charges $90+$30 shipping for the data.
Machine Learning and Data Mining - Datasets
- Machine Learning and Data Mining - Datasets (USPS digits, faces, links to various datasets prepared for Matlab)
FlickrLogos-32 dataset
- The dataset FlickrLogos-32 contains photos depicting logos and is meant for the evaluation of multi-class logo detection as well as logo retrieval methods on real-world images. It contains images, ground truth, annotations and evaluation scripts.
Searchable List of Free Public Data Mining Datasets
- Keyword searchable list of 200+ free research-quality public datasets from academia, cloud sources, conferences, books and papers, and many others.
Time Series Data Library
- A collection of over 500 time series, maintained by Rob Hyndman. Time series are organized by subject.
Dataset Generator
- Datgen is a computer program that generates data to systematically test programs that consume data. These synthetic datasets can be used to validate learning algorithms.
aiHitdata
- Random 10,000 worldwide companies sampled from aiHit. All data in this DB extracted and updated automatically from WWW using AI and machine learning.
Click
[ Submit ]
above to Add a New Site, Update a Site, or Remove a Site from this Category.
This directory is made available through a Creative Commons Attribution license from the
DMOZ Organization.