Why Depth Matters in Parallelizable Sequence Models: A Lie Algebraic View

Source

arxiv.orgfull article ↗

Publisher summary· verbatim

arXiv:2603.05573v2 Announce Type: replace Abstract: Scalable sequence models, such as Transformer variants and structured state-space models, often trade expressivity power for sequence-level parallelism, which enables efficient training. Here we examine the bounds on error and how error scales when

Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

Discussion

No replies yet. Be first.

Why Depth Matters in Parallelizable Sequence Models: A Lie Algebraic View

Related coverage

Why Depth Matters in Parallelizable Sequence Models: A Lie Algebraic View

Related coverage