Discussion about this post

User's avatar
Neural Foundry's avatar

Strong work on demonstrating that scale itself addresses causal confusion. The insight that network depth and non-linearity help optimization escape spurious correlations is practical and sometihng I've seen play out in adjacent domains. Releasing the full training recipe and dataset is gonna accelerate this space significantly, especially for smaller teams who can't afford massive compute for experimentation.

Meysam's avatar

Very nice and timely work.

Where can one read in more details about your toy experiment setup and the large scale one? Like how did you define the causality score and why it has been concluded that causal confusion reduces by depth and more data? Do you have some scaling laws on that?

1 more comment...

No posts

Ready for more?