“Quick and dirty” recipe for the middle dictionary:

  1. Select a list of words for the dictionary. I chose the “I like the red ball…” prompt and an additional 500 words randomly selected from this list of 3000 common English words.
  2. Tokenize the words, and enter each one in a 2-token prompt: [<s>, I], [<s>, like], [<s>, the], etc.
  3. Run the model for each 2-token prompt, and save a vector v representing the output of the second layer in the 2nd token location. For each model run:
    1. A. Remove the dummy (<s>) token component from v, as |v’> = |v> - |dummy><dummy|v>, where <dummy|v> denotes an inner product, and the v and s vectors are L2 normalized beforehand. Note that <dummy|v’> = 0.
    2. B. Repeat the procedure in step 3.i. to remove the input dictionary vector component of the 2nd prompt token from v’.
    3. C. Normalize the vector and enter it in the new dictionary!

One should bear in mind that this is just a quick stab at reconstructing a "middle dictionary", and it should be easy to do much better. Weaknesses of the approach here include:

  1. It probably yields a mixture of content from several dictionaries that are in use in the 2nd layer, including the output dictionary and whatever set of "middle dictionaries" the model creates.
  2. The vocabulary of 'meanings' encoded by the middle dictionary does not need to be identical to the input dictionary. By forcing a mapping onto tokens, it's likely that we're missing elements of relational context or other syntax that may exist within the middle dictionary.