Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • apa.csl
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Pre-training Deep Q-Networks Eliminates the Need for Target Networks: An Empirical Study
Karlstad University, Faculty of Health, Science and Technology (starting 2013), Department of Mathematics and Computer Science (from 2013). (Distributed Intelligent Systems and Communication (DISCO))
Karlstad University, Faculty of Health, Science and Technology (starting 2013), Department of Mathematics and Computer Science (from 2013). (Distributed Intelligent Systems and Communication (DISCO))ORCID iD: 0000-0001-7547-8111
Karlstad University, Faculty of Health, Science and Technology (starting 2013), Department of Mathematics and Computer Science (from 2013). (Distributed Intelligent Systems and Communication (DISCO))ORCID iD: 0000-0003-4147-9487
2025 (English)In: Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods ICPRAM / [ed] Modesto Castrillon-Santana; Maria De Marsico and Ana Fred, SciTePress, 2025, Vol. 1, p. 437-444Conference paper, Published paper (Refereed)
Abstract [en]

Deep Q-Learning is an important algorithm in the field of Reinforcement Learning for automated sequential decision making problems. It trains a neural network called the DQN to find an optimal policy. Training is highly unstable with high variance. A target network is used to mitigate these problems, but leads to longer training times and, high training data and very large memory requirements. In this paper, we present a two phase pre-trained online training procedure that eliminates the need for a target network. In the first - offline -  phase, the DQN is trained using expert actions. Unlike previous literature that tries to maximize the probability of picking the expert actions, we train to minimize the usual squared Bellman loss. Then, in the second - online - phase, it continues to train while interacting with an environment (simulator). We show, empirically, that the target network is eliminated; training variance is reduced; training is more stable; when the duration of pre-training is carefully chosen the rate of convergence (to an optimal policy) during the online training phase is faster; the quality of the final policy found is at least as good as the ones found using traditional methods.

Place, publisher, year, edition, pages
SciTePress, 2025. Vol. 1, p. 437-444
Keywords [en]
deep q-network, deep q-learning, stability, pre-training, variance reduction
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kau:diva-102691DOI: 10.5220/0013374600003905Scopus ID: 2-s2.0-105002403768OAI: oai:DiVA.org:kau-102691DiVA, id: diva2:1927408
Conference
14th International Conference on Pattern Recognition Applications and Methods, Porto, Portugal, February 23-25, 2025.
Available from: 2025-01-14 Created: 2025-01-14 Last updated: 2026-01-14Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopusFulltext

Authority records

Lindström, AlexanderRamaswamy, ArunselvanGrinnemo, Karl-Johan

Search in DiVA

By author/editor
Lindström, AlexanderRamaswamy, ArunselvanGrinnemo, Karl-Johan
By organisation
Department of Mathematics and Computer Science (from 2013)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 150 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • apa.csl
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf