Datasets available for Datahub/DSMLP

The datasets listed below are available for use in DataHub by opening a terminal window (while in your DataHub environment) and typing "cd /datasets". In your jupyter notebook, the path for reading in the dataset would be "/datasets/". We have multiple "/datasets" mounts (there is also /datasets-2 and /datasets-3) so please check those if your dataset isn't in "/datasets". For more information about a dataset, click its URL under the "URL (More information)" column.

 

Several of the datasets below are part of the Library's UC San Diego Educational Dataset Service Collection (item list). To request additional datasets from this collection, please email datahub@ucsd.edu. You may also contact us if you would like to have a private dataset (as most things in /datasets are publicly readable).

Name of Dataset in DataHub Title/Details URL (More Information)
BindingDB Dataset, [latest date] "BindingDB Dataset, January 1, 2024. In BindingDB: Measured Binding Data for Protein-Ligand and Other Molecular Systems. Data downloaded from component 4: TSV file containing all protein-ligand data in BindingDB" https://doi.org/10.6075/J0BP02ZT
California Local Tax Ballot Measures, 1986 to 2012 "California Local Tax Ballot Measures, 1986 to 2012. In California Local Tax Ballot Measures. Data downloaded from component 1: Data" https://doi.org/10.6075/J09P2ZT7
Cars Overhead with Context (COWC) "Cars Overhead with Context (COWC). In Lawrence Livermore National Laboratory (LNLL) Open Data Initiative. Data downloaded from component 8: COWC-M datasets and networks" https://doi.org/10.6075/J0CN72BC
Data from: Multi-Source Feature Fusion for Object Detection Association in Connected Vehicle Environments "Data from: Multi-Source Feature Fusion for Object Detection Association in Connected Vehicle Environments. Data downloaded from components 3 to 5" https://doi.org/10.6075/J0HX1CVJ
Data from: Quantifying influence of human choice on the automated detection of Drosophila behavior by a supervised machine learning algorithm "Data from: Quantifying influence of human choice on the automated detection of Drosophila behavior by a supervised machine learning algorithm. Data downloaded from components 12 to 32, under Movies and associated files" https://doi.org/10.6075/J0QF8RDZ
Heterogeneous Stock (HS) Rat Genotypes, Version 4 "Heterogeneous Stock (HS) Rat Genotypes, Version 4. In Genotype Data from: NIDA Center for GWAS in Outbred Rats. Data downloaded from components 2 to 7" https://doi.org/10.6075/J0X63N54
Training image data for: Environmental and ecological drivers of harmful algal blooms in the Southern California Bight "Data from: Multi-Source Feature Fusion for Object Detection Association in Connected Vehicle Environment Data downloaded from components 1 and 2" https://doi.org/10.6075/J00865GT
Using NLP to Predict the Severity of Cyber Security Vulnerabilities "Using NLP to Predict the Severity of Cyber Security Vulnerabilities. In Data Science & Engineering Master of Advanced Study (DSE MAS) Capstone Projects. Data downloaded from component 4: Input data" https://doi.org/10.6075/J0TX3F89
Video Game Reviews Sentiment to Popularity "Video Game Reviews Sentiment to Popularity. In Data Science & Engineering Master of Advanced Study (DSE MAS) Capstone Projects. Data downloaded from component 4: Input file" https://doi.org/10.6075/J06D5T5H