Discussion about this post

User's avatar
nathan t's avatar

Whats the advantage of RL here? Instead of using the better foundational models with a thinking tool and the typical ReACT pattern etc? I guess token costs & latency are the main advantages? Does that make it worth it? (Genuine question, interested to understand the why behind doing this)

Expand full comment

No posts