23

alt text

Noticing a couple of "This IP has exceeded the request-per-day limit" errors on StackPrinter log, I asked for a verification and the prompt answer was:

There are a number of [app]s using Google App Engine, like StackPrinter.

Unfortunately, you're all using the same pool of IPs. Thus, you're all on the same quota.

If you can introduce a proxy or forwarder of some manner to your [app], you can solve this problem.

In few words: Google App Engine app[s] are doomed.

What?
I seriously thought that the quota check was by IP+AppKey.
Let's see the faq:

What are the API request limits?

A single IP address can only make a certain number of API requests per day, depending on the presence of a valid API key.

Although this seems quite clear, this sentence does not cover the case where the API keys, consumed by the same IP, are more than one.

Looking at "my" StackPrinter quota right now I see: x-ratelimit-current - 4486
Last evening was above 9000 and during the night there were just 50/60 prints; this sadly confirm the "shared quota" problem.

People have worked hard to rig up StackApps [App]s hosted on Google App Engine.
Here are some example:

This is a serious limitation of the API usage and without a proper action, our Gae Applications will starve in no time.

Proposal:

  • Is it possible to move the IP quota check to a more fair IP+Key or IP+AppId or IP+User-Agent or KeyOnly check?
  • Could "authenticated requests" be a solution?
  • Could you raise the 10.000 threshold for the Google pools of IP to temporarily tampon these "This IP has exceeded the request-per-day limit" errors?

One last thing: the proxy idea is useless because it is slow and does not scale with Google App Engine.

* Feel free to edit this question to add better ideas\proposals.


EDIT:
One Google App Engine IP is gone right now; a call against Stack Overflow api returns a dreadful x-ratelimit-current - 0


EDIT:
In response to Kevin's answer:
well, I think you are probably right in most of your thinking; I've drilled down into this topic a little bit further and all the APIkey-oriented checks I proposed (suggested more by the heat than by careful research) terribly suck.
Ok, they could be good for a toy API but not for a long term awesome project like the Stack Exchange API.

That said, I have to blame myself to have totally overlooked this aspect :).
In my defense, as previously said, faq is not particularly clear about this aspect and your answer given to this precog brilliant question was pretty misleading:

Sky: Hey, watch out! If apps are hosted on the same IP they could have problem.
Kevin: We've indicated in the past that the per-day limit for a key can be increased if need is demonstrated.

This wrong concept of "Problem with rate limit? We can increase it, no problem at all!" is sneakily reinforced in the faq:

What should I do if I need more requests per day?

Certain types of applications - services and websites to name two - can legitimately have much higher per-day request requirements than typical applications. If you can demonstrate a need for a higher request quota, contact us.

Wrong! This can be applied just for the "rich" lucky guy who has his own dedicated IP!
And remember, even moving your poor bleeding [app]s from the Cloud to Dreamhost does not solve the problem (although is less probable to have an hog request [app] on your same IP).
What is the lesson I learnt..
an API, to be a state of the art serious API to play with (for a website), must provide Authentication*; without this feature, you are asking for trouble.

I recommend you to update the faq and be clear about this danger.

What about our Google App Engine application?
Do we really need to wait an Authentication feature to port our app and move away from the Cloud?
Waiting for this feature, and still having a relatively small number of applications, is it possible to relax the key limit constraint a little?
Is a TLD+IP check a possible solution?

ok, too much thoughts.. thanks for your patience.

EDIT:
working solution here

* Oauth is not the only available solution; take Amazon Web Services for example, they allow authenticated REST calls passing apikey+timestamp+signature inside the query string for each request.

6
  • I Have no big hopes that this issue would be resolved anytime soon. Even Twitter claims that the majority of cloud platforms, including Google App Engine, cannot be whitelisted to access their Search API due to absence of static IP. But I continue to hope that something would be done. Commented Oct 20, 2010 at 9:43
  • 1
    @Vladislav I totally agree that a cloud platform can't be whitelisted! But Twitter offers a way to bypass this problem using authenticated requests per app and/or user. Commented Oct 20, 2010 at 9:48
  • 1
    DOOOOOOOOOOOOOOOOMED! Commented Oct 20, 2010 at 17:17
  • 3
    For reference, we (App Engine) include the App ID in the user-agent header, and we don't let users set the user-agent header on outgoing requests. If the IP comes from our pool (_netblocks.google.com), you can trust that the App ID in the user-agent is accurate. Commented Oct 28, 2010 at 15:15
  • @Nick are you sure? AFAIK user-agent can be set since SDK version 1.2.1 . I remember this because I auto answered my own question on this topic 7 month ago here Commented Nov 22, 2010 at 13:41
  • @systempuntoout I'm positive - it's always been the case that although you can set part of the user-agent string, you can't remove your App ID from it. Commented Sep 5, 2011 at 4:47

5 Answers 5

6

The proposed "solutions" to this are all terrible*.

Ip+Key

Trivially forged.

IP+AppId

Trivially forged.

IP+User-Agent

Trivially forged.

KeyOnly

Surprise surprise, trivially forged.


Now, authentication would give us a non-trivially forged token to throttle on. This is a planned improvement in a future API version, however this workaround would require [app]s to force all users to login before using them (if our goal was to do away with IP based throttles altogether).

This problem will exist for any user-agnostic (ie. non-authenticating) [app] which shares its IP forever.

Essentially, [app]s on Google App Engine (or any other cloud service) are sitting behind a 1000-to-1 users-to-IPs order NAT, and NAT breaks the internet. If you want to avoid this problem, you'll need to get a dedicated IP somehow.

If there were a viable work around for this problem, we'd implement it, but frankly the issue is so far upstream of us there's nothing we can do.

Some other APIs with IP based throttling (in part, at least):

There are many others, these are just a few which are easy to find references for.

They are all, in fact, variations of security through obscurity.

13 Comments

How about white listing certified [app]s? Those apps will be throttled based on key only.
@Kevin why do you think that the key will leak? If the code runs on the server no one will see it and its the best interest of the key owner to take care so that the key won't leak. If worse come to worse and the key leak, the owner will notice and request a new key.
The key revocation can be done automatically by the key owner without any intervention on your behalf. The problem is how do you certified an application, I would start with ones that have a decent reputation on one of the SE sites.
AppID isn't trivially forged: We (App Engine) include the App ID in the user-agent header, and we don't let users set the user-agent header on outgoing requests. If the IP comes from our pool (_netblocks.google.com), you can trust that the App ID in the user-agent is accurate.
@systempuntoout In an ideal world with unlimited IPs and no cloud solutions with shared egress, I'd agree. As it is, the world isn't ideal, and anything that limits on IP either needs to special case large heterogenous sources of egress traffic, or implement keys/access tokens.
|
5

I was wondering if the scenario of a server based app that lives in a hosted environment, where a single IP can be shared by many web sites.

As I understand the rate-limit as it stands, every site on the shared host IP, regardless of api key, will be sharing the same rate-limit of 10,000 per site per day.

more @ rate-limit per endpoint per IP VS. server apps in hosted environments

Looks like everyone in GAE either

  • gets unlimited rate-limit
  • rate-limit engine checks for key + ip + top level domain (or some variation) of the referrer as described in linked post
  • GAE is no longer a viable platform for API apps

NOTE: this issue is not limited to the current fire - anyone on a shared platform, e.g. standard shared webhosting, is vulnerable to this issue.

1 Comment

totally forgot that brilliant precog question (that I had upvoted months ago). From what I see, just increasing the key limit does not solve the problem. In that scenario, a key hog app could continue to eat from other dishes without any problem leaving the other poor apps starving to death.
2

I had a GAE app that queried hourly a few sites for traffic stats. I've brought it down. Thanks for the warning and enjoy your 72 extra calls per day :)

Comments

2

All this issues also apply to request throttling limits (30 requests over 5 seconds).
I'm sure that I'm throttling requests from my app properly according to api terms of usage based on a distributed lock with memcached, but still frequently getting following error "Too Many Requests, We're sorry...There are an unusual number of requests coming from this IP address.To protect our users, we can't process any more requests from this IP address right now."

1 Comment

also covered in an early post - too lazy to find it right now, but thanks for bringing it up.
1

A possible solution will be to whitelist some app keys, the whitelisting means that the ip is not checked and only the app key is check.

10 Comments

@Shay Just checked: x-ratelimit-current - 2814 --> I think in few hours will be doomed again
Here is what I've just got in a request that makes just a few api calls -> X-RateLimit-Current: 2262, X-RateLimit-Current: 2002
x-ratelimit-current - 1947
GONE!!!!!!!!!!!!
@system - sorry bout your luck. looks like everyone in GAE either gets unlimited rate-limit or rate-limit engine checks for key+TLD or GAE is no longer a viable platform for API apps
|

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.