Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • apa.csl
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Pre-training Deep Q-Networks Eliminates the Need for Target Networks: An Empirical Study
Karlstad University, Faculty of Health, Science and Technology (starting 2013), Department of Mathematics and Computer Science (from 2013). (Distributed Intelligent Systems and Communication (DISCO))
Karlstad University, Faculty of Health, Science and Technology (starting 2013), Department of Mathematics and Computer Science (from 2013). (Distributed Intelligent Systems and Communication (DISCO))ORCID iD: 0000-0001-7547-8111
Karlstad University, Faculty of Health, Science and Technology (starting 2013), Department of Mathematics and Computer Science (from 2013). (Distributed Intelligent Systems and Communication (DISCO))ORCID iD: 0000-0003-4147-9487
2025 (English)In: The 14th International Conference on Pattern Recognition Applications and Methods, Porto, Portugal, February 23-25, 2025, 2025Conference paper, Published paper (Refereed)
Abstract [en]

Deep Q-Learning is an important algorithm in the field of Reinforcement Learning for automated sequential decision making problems. It trains a neural network called the DQN to find an optimal policy. Training is highly unstable with high variance. A target network is used to mitigate these problems, but leads to longer training times and, high training data and very large memory requirements. In this paper, we present a two phase pre-trained online training procedure that eliminates the need for a target network. In the first - offline -  phase, the DQN is trained using expert actions. Unlike previous literature that tries to maximize the probability of picking the expert actions, we train to minimize the usual squared Bellman loss. Then, in the second - online - phase, it continues to train while interacting with an environment (simulator). We show, empirically, that the target network is eliminated; training variance is reduced; training is more stable; when the duration of pre-training is carefully chosen the rate of convergence (to an optimal policy) during the online training phase is faster; the quality of the final policy found is at least as good as the ones found using traditional methods.

Place, publisher, year, edition, pages
2025.
Keywords [en]
deep q-network, deep q-learning, stability, pre-training, variance reduction
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kau:diva-102691OAI: oai:DiVA.org:kau-102691DiVA, id: diva2:1927408
Conference
The 14th International Conference on Pattern Recognition Applications and Methods (ICPRAM)
Available from: 2025-01-14 Created: 2025-01-14 Last updated: 2025-10-16Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Fulltext

Authority records

Lindström, AlexanderRamaswamy, ArunselvanGrinnemo, Karl-Johan

Search in DiVA

By author/editor
Lindström, AlexanderRamaswamy, ArunselvanGrinnemo, Karl-Johan
By organisation
Department of Mathematics and Computer Science (from 2013)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 134 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • apa.csl
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf