commit | b684ae9dc3537218832094f93a237195a6815cf4 | [log] [tgz] |
---|---|---|
author | Bogdan Caprita <caprita@google.com> | Fri Apr 22 14:13:01 2016 -0700 |
committer | Bogdan Caprita <caprita@google.com> | Fri Apr 22 14:13:01 2016 -0700 |
tree | c854424a00dd9c13fafcc46199f4d44385c0ece8 | |
parent | ffa4b474b2d057e615a3c7631c9662bd46a508ee [diff] |
runtime/internal/rpc: avoid client-side retries from resulting in timeout errors Currently, if we try to invoke an RPC against a non-existant name, we get a confusing timeout error. E.g., $ vrpc signature doesnotexist ERROR: Signature failed: vrpc:<rpc.Client>"doesnotexist".ResolveStep: Timeout: [remote=@6@@...<endpoint>...@@: vrpc:<rpc.Client>"doesnotexist".ResolveStep: ended before version byte received : failed to decode response: vrpc: ended before version byte received : EOF] This gives no clue to the user that the name is missing from the mounttable. Instead, they get a timeout from ResolveStep and some confusing EOF error that at best is irrelevant and at worst sends the user down the wrong path in debugging the problem. What's actually happening: The client RPC code tries to resolve the name against the mounttable; it fails (with the very sensible 'mounttabled "doesnotexist".ResolveStep Name doesnotexist doesn't exist'). But then connectToName's backoff/retry mechanism repeats this step until the RPC deadline is almost reached (each time getting the resolution error). Finally, the last attempt is done so close to the deadline that the ResolveStep will co-occur with the context timeout, hence the confusing EOF message and resulting timeout. This last gets returned to the user, instead of the sensible 'Name ... does not exist' error. What this CL does: 1. Change the backoff time computation to increase the time set aside for the call to happen from 1 ms to 100 ms: even on my linux desktop, 1 ms is insufficient to do even the (single-step) ResolveStep, not to mention the actual server call. On arm or android things are much worse. With this change, we have a decent chance that the context will not time out in the middle of a ResolveStep. 2. Since it's still possible to have the context time out (e.g. if the server took long to reply), we also add logic to return the last non-timeout related error during a retry loop: this way, we 'ignore' the last retry iteration if it results in timeouts and instead convey to the user an error that's more likely to point them to the actual failure cause. This CL is a step towards https://github.com/vanadium/issues/issues/1290 Change-Id: Icf8f07e314a298987553cb9a1fc2defa40617246
This repository contains a reference implementation of the Vanadium APIs.
Unlike the APIs in https://github.com/vanadium/go.v23, which promises to provide backward compatibility this repository makes no such promises.