CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • apa.csl
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Improved workflow for constructing machine learning models: Predicting retention times and peak widths in oligonucleotide separation
Karlstad University, Faculty of Health, Science and Technology (starting 2013), Department of Engineering and Chemical Sciences (from 2013).ORCID iD: 0000-0003-1819-1709
Karlstad University, Faculty of Health, Science and Technology (starting 2013), Department of Engineering and Chemical Sciences (from 2013).ORCID iD: 0000-0002-8943-6286
Karlstad University, Faculty of Health, Science and Technology (starting 2013), Department of Engineering and Chemical Sciences (from 2013).
Karlstad University, Faculty of Health, Science and Technology (starting 2013), Department of Mathematics and Computer Science (from 2013).ORCID iD: 0009-0006-7733-8298
Show others and affiliations
2025 (English)In: Journal of Chromatography A, ISSN 0021-9673, E-ISSN 1873-3778, Vol. 1747, article id 465746Article in journal (Refereed) Published
Abstract [en]

This study presents an improved workflow to support the development of machine learning models to predict oligonucleotide retention times, peak widths and thus peak resolutions, from larger datasets where manual processing is not feasible. We explored diverse oligonucleotide forms, ranging from native to fully phosphorothioated, using three different gradient slopes. Both native and phosphorothioated oligonucleotides were separated, using a chromatographic C18 system with tributylaminium ion as the ion-pair reagent in the eluent, resulting in retention time data for approximately 900 sequences per gradient. For managing the large and extensive datasets, we developed a semi-automatic rule-based approach for retention time determination, peak decomposition, peak width assessment, signal-to-noise ratio, and skewness analysis. Probability density functions (PDFs) were fitted to elution profiles, with PDF selection based on an Ftest. Co-eluting peaks were addressed using a multiple Gaussian PDF. The encoded sequence data underwent modeling using support vector regression (SVR), gradient boosting (GB), random forest (RF), and decision tree (DT) models. GB and SVR showed promise for retention predictions, while RT and DT were faster but demonstrated limited generalization capabilities. The machine learning models exhibited larger errors for the shallowest gradient and lower predictability for P=O sequences, potentially due to signal intensity and sequence heterogeneity. Improvements in signal-to-noise ratios were considered, including mass spectrometry in selected ion monitoring mode. The best model for this data sets were GB, closely followed by the SVR model. With established models for retention and peak width, chromatograms can now be predicted for various gradient slopes, offering prediction of impurity peak resolution for arbitrary sequences and gradient slopes.

Place, publisher, year, edition, pages
Elsevier, 2025. Vol. 1747, article id 465746
Keywords [en]
Oligonucleotides, Ion-pair chromatography, Machine learning, Computer simulation, Resolution predictions
National Category
Bioinformatics (Computational Biology) Analytical Chemistry
Research subject
Chemistry; Computer Science
Identifiers
URN: urn:nbn:se:kau:diva-103955DOI: 10.1016/j.chroma.2025.465746ISI: 001436803200001PubMedID: 40014960Scopus ID: 2-s2.0-85218463003OAI: oai:DiVA.org:kau-103955DiVA, id: diva2:1951631
Funder
Knowledge Foundation, 20210021Available from: 2025-04-11 Created: 2025-04-11 Last updated: 2025-04-11Bibliographically approved

Open Access in DiVA

fulltext(1588 kB)40 downloads
File information
File name FULLTEXT01.pdfFile size 1588 kBChecksum SHA-512
470242f3b00f32c712febf3f23a5e7fedf291455287aa865f2c6ece3fd1a9132ef71bc1bb60c017bd3d191854750719cccf4c6ae2e3ee5cc572f9bfbec7666df
Type fulltextMimetype application/pdf

Other links

Publisher's full textPubMedScopus

Authority records

Samuelsson, JörgenEnmark, MartinRahal, ManalAhmed, Bestoun S.Häggstrom, JakobForssén, PatrikFornstedt, Torgny

Search in DiVA

By author/editor
Samuelsson, JörgenEnmark, MartinRahal, ManalAhmed, Bestoun S.Häggstrom, JakobForssén, PatrikFornstedt, Torgny
By organisation
Department of Engineering and Chemical Sciences (from 2013)Department of Mathematics and Computer Science (from 2013)
In the same journal
Journal of Chromatography A
Bioinformatics (Computational Biology)Analytical Chemistry

Search outside of DiVA

GoogleGoogle Scholar
Total: 42 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 136 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • apa.csl
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf