Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.11851/11004
Title: Open Source Software Tools for Data Management and Deep Model Training Automation
Authors: Tiraşoǧlu, U.
Türker, A.
Ekici, A.
Yiǧit, H.
Bölükbaşi, Y.E.
Akgün, T.
Keywords: augmentation; dataset management; deep model; training automation
Codes (symbols); Computer graphics; Deep learning; Graphics processing unit; Information management; Large dataset; Open source software; Open systems; Personnel training; Program processors; Augmentation; Controlled experiment; Dataset management; Deep model; Hyper-parameter; Large datasets; Model training; Open-source softwares; Software-tools; Training automation; Automation
Issue Date: 2023
Publisher: Institute of Electrical and Electronics Engineers Inc.
Abstract: Designing and optimizing deep models require managing large datasets and conducting carefully designed controlled experiments that depend on large sets of hyper-parameters and problem dependent software/data configurations. These experiments are executed by training the model under observation with varying configurations. Since executing a typical training run can take days even on proven acceleration fabrics such as Graphics Processing Units (GPU), properly managing training data, avoiding human error in configuration preparations and securing the repeatability of the experiments are of utmost importance. In this paper, we present two open source software tools that aim to achieve these goals, namely, a Dataset Manager (DatumAid) tool and a Training Automation Manager (OrchesTrain) tool. DatumAid is a software tool that integrates with Computer Vision Annotation Tool (CVAT) to facilitate the management of annotated datasets. By adding additional functionality, DatumAid allows users to filter labeled data, manipulate datasets, and export datasets for training purposes. The tool adopts a simple code structure while providing flexibility to users through configuration files. OrchesTrain aims to automate model training process by facilitating rapid preparation and training of models in the desired style for the intended tasks. Users can seamlessly integrate their models prepared in the PyTorch library into the system and leverage the full capabilities of OrchesTrain. It enables the simultaneous or separate usage of Wandb, MLflow, and TensorBoard loggers. To ensure reproducibility of the conducted experiments, all configurations and codes are saved to the selected logger in an appropriate structure within a YAML file along with the serialized model files. Both software tools are publicly available on GitHub. © 2023 IEEE.
Description: 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023 -- 11 September 2023 through 15 September 2023 -- 194295
URI: https://doi.org/10.1109/ASE56229.2023.00014
https://hdl.handle.net/20.500.11851/11004
ISBN: 9798350329964
Appears in Collections:Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection
WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection

Show full item record



CORE Recommender

Google ScholarTM

Check




Altmetric


Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.