“Quick and dirty” recipe for the middle dictionary:
- Select a list of words for the dictionary. I chose the “I like the red ball…” prompt and an additional 500 words randomly selected from this list of 3000 common English words.
- Tokenize the words, and enter each one in a 2-token prompt: [<s>, I], [<s>, like], [<s>, the], etc.
- Run the model for each 2-token prompt, and save a vector v representing the output of the second layer in the 2nd token location. For each model run:
- A. Remove the dummy (<s>) token component from v, as |v’> = |v> - |dummy><dummy|v>, where <dummy|v> denotes an inner product, and the v and s vectors are L2 normalized beforehand. Note that <dummy|v’> = 0.
- B. Repeat the procedure in step 3.i. to remove the input dictionary vector component of the 2nd prompt token from v’.
- C. Normalize the vector and enter it in the new dictionary!
One should bear in mind that this is just a quick stab at reconstructing a "middle dictionary", and it should be easy to do much better. Weaknesses of the approach here include:
- It probably yields a mixture of content from several dictionaries that are in use in the 2nd layer, including the output dictionary and whatever set of "middle dictionaries" the model creates.
- The vocabulary of 'meanings' encoded by the middle dictionary does not need to be identical to the input dictionary. By forcing a mapping onto tokens, it's likely that we're missing elements of relational context or other syntax that may exist within the middle dictionary.