runtime/internal/rpc: avoid client-side retries from resulting in timeout errors

Currently, if we try to invoke an RPC against a non-existant name, we
get a confusing timeout error. E.g.,

$ vrpc signature doesnotexist

ERROR: Signature failed: vrpc:<rpc.Client>"doesnotexist".ResolveStep:
Timeout: [remote=@6@@...<endpoint>...@@:
vrpc:<rpc.Client>"doesnotexist".ResolveStep: ended before version byte
received : failed to decode response: vrpc: ended before version byte
received : EOF]

This gives no clue to the user that the name is missing from the
mounttable.  Instead, they get a timeout from ResolveStep and some
confusing EOF error that at best is irrelevant and at worst sends the
user down the wrong path in debugging the problem.

What's actually happening:

The client RPC code tries to resolve the name against the mounttable; it
fails (with the very sensible 'mounttabled "doesnotexist".ResolveStep
Name doesnotexist doesn't exist'). But then connectToName's
backoff/retry mechanism repeats this step until the RPC deadline is
almost reached (each time getting the resolution error).  Finally, the
last attempt is done so close to the deadline that the ResolveStep will
co-occur with the context timeout, hence the confusing EOF message and
resulting timeout.  This last gets returned to the user, instead of the
sensible 'Name ... does not exist' error.

What this CL does:

1. Change the backoff time computation to increase the time set aside
for the call to happen from 1 ms to 100 ms: even on my linux desktop, 1
ms is insufficient to do even the (single-step) ResolveStep, not to
mention the actual server call.  On arm or android things are much
worse.  With this change, we have a decent chance that the context will
not time out in the middle of a ResolveStep.

2. Since it's still possible to have the context time out (e.g. if the
server took long to reply), we also add logic to return the last
non-timeout related error during a retry loop: this way, we 'ignore' the
last retry iteration if it results in timeouts and instead convey to the
user an error that's more likely to point them to the actual failure
cause.

This CL is a step towards https://github.com/vanadium/issues/issues/1290

Change-Id: Icf8f07e314a298987553cb9a1fc2defa40617246
1 file changed
tree: c854424a00dd9c13fafcc46199f4d44385c0ece8
  1. cmd/
  2. examples/
  3. internal/
  4. lib/
  5. runtime/
  6. services/
  7. test/
  8. .gitignore
  9. AUTHORS
  10. CONTRIBUTING.md
  11. CONTRIBUTORS
  12. envvar.go
  13. envvar_test.go
  14. LICENSE
  15. PATENTS
  16. README.md
  17. VERSION
README.md

Vanadium

This repository contains a reference implementation of the Vanadium APIs.

Unlike the APIs in https://github.com/vanadium/go.v23, which promises to provide backward compatibility this repository makes no such promises.