Tutorial 2 - applying query-filters

We previously covered the basics of using data-endpoints with voeventdb.remote in tutorial 1.

** This notebook demonstrates use of filters to narrow down your query, and introduces a few convenient ‘helper classes’ for handling nested data-structures. **

As before, we’ll switch on ‘DEBUG’ level logging, to see the the HTTP requests go whizzing by.

In [ ]:
from __future__ import print_function
import logging
logging.basicConfig(level=logging.DEBUG)
In [ ]:
import voeventdb.remote as vr
import voeventdb.remote.apiv1 as apiv1

We’ve already briefly looked at the map_stream_count endpoint, and mentioned how VOEvents come in three flavours of role, ‘observation’, ‘utility’, and ‘test’. Let’s remind ourselves what the default map_stream_count output looks like:

In [ ]:
apiv1.map_stream_count()

Using filters

Quite obviously, a number of those streams are ‘junk’, they contain only test-packets used to verify that the VOEvent infrastructure is up and working correctly. For scientific work, we’ll want to filter those out.

Fortunately, we can ask the voeventdb server to do the filtering work for us. The voeventdb.remote library comes with an easy-to-use list of filters, stored as `voeventdb.remote.apiv1.FilterKeys <http://voeventdbremote.readthedocs.org/en/latest/reference/index.html#voeventdb.remote.apiv1.FilterKeys>`__. To see what’s available at a glance you can use the IPython tab-completion and doc-lookup tools, as in the cell below.

Full definitions of the filter-keys (and example filter-values) can be found in the voeventdb server docs, but we’ll cover most of them in these tutorial notebooks - read on.

In [ ]:
#Alias voeventdb.remote.apiv1.FilterKeys to just 'FilterKeys', for brevity
from voeventdb.remote.apiv1 import FilterKeys
In [ ]:
## To see the list of filters, you can use tab-completion:
## (Uncomment the following line and try it for yourself)
# FilterKeys.
## Or the ipython doc-lookup magic, by prefixing with ``??`` and running the cell:
# ??FilterKeys

Filtering by role

So: we were trying to filter out the test-packets. FilterKeys.role sounds promising. To apply a filter, or multiple filters, we simply define a dictionary with the filters we want to apply, and then pass it to the relevant query-function, like this:

In [ ]:
my_filters = { FilterKeys.role: 'observation' }
In [ ]:
apiv1.map_stream_count(my_filters)

Filtering by date

That results in a much shorter list, containing only scientifically interesting streams. Still, those numbers are pretty large (mainly for Swift). It might be useful to get a smaller representative sample. How many packets will we get if we limit our query to a single week?

In [ ]:
from datetime import datetime, timedelta
import pytz
start_date = datetime(2015,12,1,tzinfo=pytz.UTC)
my_filters = {
    FilterKeys.role: 'observation',
    FilterKeys.authored_since: start_date,
    FilterKeys.authored_until: start_date + timedelta(days=7)
    }
my_filters
In [ ]:
apiv1.map_stream_count(my_filters)

Filtering by stream

Ok, so there’s still a lot of Swift packets there. Let’s take a look at a sample of those, and see if we can break them up further. First, lets add another filter to limit our query to just Swift packets.

In [ ]:
my_filters[FilterKeys.stream] = 'nasa.gsfc.gcn/SWIFT'
my_filters

So now if we apply the filters to map_stream_count, we only get back one entry (the Swift stream):

In [ ]:
apiv1.map_stream_count(filters=my_filters)

Filters can be used across different query-endpoints

Not particularly helpful, but at least everything is working as expected. Now, the neat thing about the voeventdb filters is that they can be applied to any query-endpoint - we can just re-use the filter-dictionary with the apiv1.list_ivorn function to get back a list of IVORNs:

In [ ]:
swift_ivorns = apiv1.list_ivorn(filters=my_filters)
print("Retrieved",len(swift_ivorns),"IVORNs")
#Show just the first 10
swift_ivorns[:10]

That’s a long list, but there’s clearly a pattern to how the Swift IVORNs are formatted. We’ll use a little Python trickery (cf set, str.rsplit) to chop off the trailing ID numbers and sort them into sub-categories:

In [ ]:
swift_categories = set(ivorn.rsplit('_',1)[0] for ivorn in swift_ivorns)
swift_categories

Now we’re getting somewhere! We can clearly see the subcategories of Swift packets - BAT alerts, XRT positions, UVOT followup, etc.

Filtering by IVORN substring

We can use this knowledge to refine our filters, by filtering on a substring of the IVORN, using the ivorn_contains filter. For example, we might want to filter to just those IVORNs containing XRT positions (note this filter is case-sensitive):

In [ ]:
my_filters[FilterKeys.ivorn_contains] = 'XRT_Pos'
my_filters
In [ ]:
xrt_pos_ivorns = apiv1.list_ivorn(filters=my_filters)
print("Retrieved",len(xrt_pos_ivorns),"IVORNs")
xrt_pos_ivorns

As in tutorial 1, we can inspect the details of any given packet using the packet_synopsis endpoint - we’ll take a look at the first one. This packet makes a good example, as it includes details of the event co-ordinates and timestamp, and also references an earlier VOEvent:

In [ ]:
synopsis_dict = apiv1.packet_synopsis(xrt_pos_ivorns[0])
synopsis_dict

Ready-made ‘helper’ classes for parsing output

Nested dictionaries can be kind of a pain to work with. If you want, you can use voeventdb.remote’s `Synopsis <http://voeventdbremote.readthedocs.org/en/latest/reference/index.html#voeventdb.remote.helpers.Synopsis>`__ ‘helper’ class to parse this into an easy-to use object.

In [ ]:
from voeventdb.remote.helpers import Synopsis
In [ ]:
xrt_synopsis = Synopsis(synopsis_dict)
# Prints with nicer formatting, ordering of values:
print(xrt_synopsis)

Now we can easily access the values (with the ever-handy IPython autocompletion):

In [ ]:
xrt_synopsis.author_ivorn
In [ ]:
xrt_synopsis.references

One of the Synopsis class attributes is a list called sky_events. Each entry is a `SkyEvent <http://voeventdbremote.readthedocs.org/en/latest/reference/index.html#voeventdb.remote.helpers.SkyEvent>`__ class, which reprents a very basic set of information about an observed event: - estimated position, - error circle on the estimated position, - timestamp of the observed event.

The position coordinates and error-circle are represented by `astropy.coordinates <http://astropy.readthedocs.org/en/stable/coordinates/index.html>`__ classes, which come with a bunch of features related to formatting, distance calculations, frame-of-reference transformations, etc.

In [ ]:
xrt_synopsis.sky_events
In [ ]:
# List of 1, in this case. Grab the first (and only) element:
sky_event = xrt_synopsis.sky_events[0]
In [ ]:
print(type(sky_event.position))
sky_event.position
In [ ]:
print(type(sky_event.position_error))
sky_event.position_error.deg

Astropy coordinates come with all the usual weird and wonderful astronomical formatting options, see the astropy docs for details:

In [ ]:
print(sky_event.position.ra.deg)
print(sky_event.position.ra.hms)

Advanced usage: specifying multiple values for the same filter

Before we move on, it’s worth mentioning that some filters can take on multiple values. This is specified by defining the filter-value as a list - for example, to return all VOEvents with a role of ‘observation’ or ‘utility’ we can use the following:

In [ ]:
my_filters = {apiv1.FilterKeys.role: ['observation','utility']}
apiv1.map_stream_count(my_filters)

How does this work? Well, we can think of each entry in the list defining a separate filter. For the role value, these filters are combined in the logical ‘OR’ sense, so we get back combined counts for both ‘observation’ and ‘utility’ packets. You can check whether a filter accepts multiple values, and if they are combined via logical ‘OR’ or ‘AND’, by checking the filter-definitions page and looking for the combinator attribute.

Coming next …

We’ve seen how to narrow our search, locate packets of interest, and use helper-classes to easily access packet details. In tutorials 3 & 4, we’ll cover different ways of finding related VOEvents.