Thursday, February 5, 2015

grequests Examples

Grequests


Grequests is the excellent requests library using gevent under the hood. This means requests will use coroutines to make multiple requests simultaneously. In addition grequests adds some helpers to give you more control of the behavior.

When is this helpful

When you are making multiple http calls at the same time it can dramatically speed everything up. For example, if you are downloading all of the assets from a web page, after you download the initial page you can download all the assets required at the same time. Another example is downloading multiple files at the same time instead of one after the other.

When is it not helpful

It would make no improvements if you are making a single request or if your requests depend on the previous response (chained calls).

Installation


 $ pip install grequests  


Simple Example


For the simple example I will check to see if a couple of websites are available. This is basically the same example as the one given in the grequest github README, but I will add some explanation.

1:  import grequests  
2:    
3:  urls = ['htt://google.com', 'http://yahoo.com', 'http://bing.com']  
4:    
5:  unsent_request = (grequests.get(url) for url in urls)
6:    
7:  results = grequests.map(unsent_request)  
8:  print results  


You will see on line 1 that we only import grequests and not import gevent and then do a monkey patch all. The grequest handles the import of gevent and since it uses gevent directly there is no monkey patching needed for these examples. It may be needed if you use other libraries though.

The next step is to make a list of request that will be sent. Here I am simply using a generator to make a collection of async requests. At this point no request have been made to any servers. All of these requests are http GET requests in an AsyncRequest, but you can do any of the standard http requests such as post, put, delete, options, or head.

I then give all of the requests to the grequests map function. The map function will issue all of the requests at the same time and wait for all of them to complete. If you have a a large number of requests and they are all hitting the same server you may need to set the size parameter for the map function. Setting the size limits the number of concurrent requests. I had to do this when making queries against a server that only supported 10 concurrent queries.

The map function returns a list of response objects. This is the same response object that you get from normal the requests library.

Helpful grequest features

send

The send command sends a single async request and returns a greenlet object with the request spawned. I think it is more useful to specify a gevent pool. The pool is useful if you want to limit the number of concurrent calls.

imap

The imap is similar to the map but it takes a generator for input and returns a generator of responses.

result hooks

grequests uses requests underneath so any of the requests extended parameters can be used in a async request. The one I use frequently is the hooks parameter. For each url request you can have a different result handler function. This makes it very easy to pick and choose what to do with the responses.

Hooks Example


1:  import grequests  
2:    
3:  def is_available(response, *args, **kwargs):  
4:    print response.url, 'is available'  
5:    
6:  def show_data(response, *args, **kwargs):  
7:    print response.url, 'content is:', r.content  
8:    
9:  post_data = "this is a test of post"  
10:  unsent_request = [grequests.get('http://www.google.com', hooks={'response': is_available}),  
11:  grequests.post('http://httpbin.org//post', hooks={'response': is_available}, data=post_data)]  
12:    
13:  print grequests.map(unsent_request, size=2)  

Note about streaming

The functions send, map, and imap all accept a stream parameter. I have not tried this out yet so I will just keep my mount shut.

Monday, February 2, 2015

Ansible error ControlPath too long when working with AWS EC2

I have been leaning Ansible for setup and deployment and I am loving it. I did my testing on a local VirtualBox VM so that I could easily revert it back to a clean state. Unfortunately when I went to try my playbooks on an EC2 instance I immediately got an error "ControlPath too long". Adding the generally hlelpfull -vvvv did not help. Even the basic ping failed, but I could ssh into the box fine.

After some hints from the stackoverflow and the Ansible docs I figured it out.

The problem is that the hostname for an ec2 instance is so long that makes the socket name too long to use. You can change the ansible config to use a shorter format, but this was not enough. The easiest solution was to use the IP address in the hosts file instead of the public DNS name.

Other solutions could include changing your /etc/hosts file, or to add a CNAME in DNS.

If you work with EC2 instances I do recommend to change your /etc/ansible/ansible.cfg by uncommenting the line below. This may help in some situations.

 control_path = %(directory)s/%%h-%%r  

This also may become a non-issue when I start using the EC2 tools for Ansible.