The advent of 5G and beyond 5G (B5G) systems has led to a significant rise in bandwidth-intensive applications within the Internet of Things (IoT), particularly in vehicular IoT (V-IoT). Effective solutions like Multipath TCP (MPTCP) and Multipath QUIC (MPQUIC) have emerged to address the escalating bandwidth demands of connected vehicles. However, challenges persist for multipath schedulers in efficiently adapting to diverse network conditions typically found in vehicular environments. In this paper, we introduce two novel variants of DEAR (Deep reinforcement learning Empowered Actor-critic scheduleR), namely, DEAR-MAC (Multiple Alternative Critics) and DEAR-CAP (Critic Associated per Path). The proposed DRL-based schedulers are tailored for multipath QUIC in 5G/B5G environments, enhancing decision-making in dynamic network scenarios often encountered by V-IoT devices. Through extensive experimentation across various network setups, including those with fluctuating bandwidth and network outages, and utilizing real-world network traces from the Lumos5G dataset, we conduct a comparative analysis against state-of-the-art learning-based schedulers like Peekaboo and rule-based schedulers like RR, ECF, BLEST, and minRTT. Our experiments show that the proposed DEAR-MAC and DEAR-CAP schedulers outperformed Peekaboo by 38.88% to 48.11%, respectively, in different heterogeneous network conditions, and the gains are much higher when compared to other rule-based schedulers. These advancements are particularly beneficial for vehicular IoT applications, ensuring more reliable and efficient data transmission, even in challenging network environments for applications such as real-time navigation, remote diagnostics, and vehicle-to-vehicle communication.