"If p(guess_i) < q(guess_i), Mq was "overconfident." The guess is still accepted, but only with probability p(guess_i) / q(guess_i). This probabilistic step is key to maintaining the original distribution.",
guess_i will be rejected if p(guess_i) / q(guess_i) < p(some_other_token_i) depending on the decoding strategy, e.g., greedy decoding, won't it?
thanks again
I assume, in this case
"If p(guess_i) < q(guess_i), Mq was "overconfident." The guess is still accepted, but only with probability p(guess_i) / q(guess_i). This probabilistic step is key to maintaining the original distribution.",
guess_i will be rejected if p(guess_i) / q(guess_i) < p(some_other_token_i) depending on the decoding strategy, e.g., greedy decoding, won't it?