Learn the mathematical structure, not the conceptual structure
I've recently been learning about transformers and noticed a failure mode of my learning that has occurred throughout my life: trying to learn a subject from material that deals with the high-level conceptual structure of something instead of learning the mathematical structure more directly. I do not mean to suggest that one needs to focus on hardcore formalizations for everything, but there is a difference between learning the conceptual structure of a subject, and learning the conceptual structure of the mathematical framework of a subject.
The most salient example to me of this phenomenon occurred when I was trying to teach myself quantum mechanics at the end of high school. I voraciously read many popular accounts of QM, watched interviews with physicists, etc. These sources would emphasize the wave-particle duality, Schrodinger's cat, the double-slit experiment, and the uncertainty principle. I could certainly recite these concepts back in conversation, but at no point did I feel like I understood quantum mechanics.
That is, until I read the Wikipedia entry on the mathematical formalism of quantum mechanics (or some similar type of reference, I don't remember exactly). There I found an explanation not of the physics of QM, but instead of the mathematical structure of QM. What I learned was that QM is a game with rules. The rules are that the state of the system is given as an arrow, and that the dynamics of the arrow are given by a pretty straightforward linear differential equation, and that "measurements" were associated with linear operators (matrices), and the rules of measurement were that the state of the system would "collapse" to an eigenvector of the operator with probabilities given by dot products of the current state with the eigenvectors.
This was mind-blowing. All that time I took reading about Schrodinger's cat I could have instead simply learned that everything comes from a vector moving according to a linear diffy-Q plus some straightforward rules about eigenvectors and linear operators.
I am no mathematician; I want to stress that I don't mean that one should focus on highly-formalized mathematics when dealing with any subject, but that often when I find myself struggling to understand something, or when I find myself having the same conversations over and over again, it pays to try to focus on finding an explanation, even an abstract conceptual explanation, not of the subject, but instead of the mathematical structure.
I think one often sees this failure mode in action in the types of subjects that lend themselves to abstracted, metaphysical, and widely applicable thinking. Some examples include predictive coding and category theory.
For example with predictive coding and active inference. It feels often that there is an enormous amount of back and forth discussion on topics like these, at an abstracted conceptual level, when instead the discussion could be made much more concrete by talking about the actual mathematical structure of these things. I get the sense (I am very much guilty of this) that many people talk about these subjects without putting ample effort into really understanding the structure underlying these ideas. What ends up happening is that subjects are overly applied to many different situations, and a lot of wheel spinning happens with no useful work being created.
Of course, this lesson can be overly applied, and there is much to be said for being able to explore ideas without caring too much about formalism and mathematics - but often when I am stuck and I feel like I haven't really grokked something despite putting in effort, it helps to remember this failure mode exists, and to seek out a different sort of explanation.