Tutorial 2 - applying query-filters¶
We previously covered the basics of using data-endpoints with voeventdb.remote in tutorial 1.
** This notebook demonstrates use of filters to narrow down your query, and introduces a few convenient ‘helper classes’ for handling nested data-structures. **
As before, we’ll switch on ‘DEBUG’ level logging, to see the the HTTP requests go whizzing by.
In [ ]:
from __future__ import print_function
import logging
logging.basicConfig(level=logging.DEBUG)
In [ ]:
import voeventdb.remote as vr
import voeventdb.remote.apiv1 as apiv1
We’ve already briefly looked at the map_stream_count
endpoint, and
mentioned how VOEvents come in three flavours of
role,
‘observation’, ‘utility’, and ‘test’. Let’s remind ourselves what the
default map_stream_count output looks like:
In [ ]:
apiv1.map_stream_count()
Using filters¶
Quite obviously, a number of those streams are ‘junk’, they contain only test-packets used to verify that the VOEvent infrastructure is up and working correctly. For scientific work, we’ll want to filter those out.
Fortunately, we can ask the voeventdb server to do the filtering work
for us. The voeventdb.remote library comes with an easy-to-use list of
filters, stored as
`voeventdb.remote.apiv1.FilterKeys
<http://voeventdbremote.readthedocs.org/en/latest/reference/index.html#voeventdb.remote.apiv1.FilterKeys>`__.
To see what’s available at a glance you can use the IPython
tab-completion and doc-lookup tools, as in the cell below.
Full definitions of the filter-keys (and example filter-values) can be found in the voeventdb server docs, but we’ll cover most of them in these tutorial notebooks - read on.
In [ ]:
#Alias voeventdb.remote.apiv1.FilterKeys to just 'FilterKeys', for brevity
from voeventdb.remote.apiv1 import FilterKeys
In [ ]:
## To see the list of filters, you can use tab-completion:
## (Uncomment the following line and try it for yourself)
# FilterKeys.
## Or the ipython doc-lookup magic, by prefixing with ``??`` and running the cell:
# ??FilterKeys
Filtering by role¶
So: we were trying to filter out the test-packets. FilterKeys.role
sounds promising. To apply a filter, or multiple filters, we simply
define a dictionary with the filters we want to apply, and then pass it
to the relevant query-function, like this:
In [ ]:
my_filters = { FilterKeys.role: 'observation' }
In [ ]:
apiv1.map_stream_count(my_filters)
Filtering by date¶
That results in a much shorter list, containing only scientifically interesting streams. Still, those numbers are pretty large (mainly for Swift). It might be useful to get a smaller representative sample. How many packets will we get if we limit our query to a single week?
In [ ]:
from datetime import datetime, timedelta
import pytz
start_date = datetime(2015,12,1,tzinfo=pytz.UTC)
my_filters = {
FilterKeys.role: 'observation',
FilterKeys.authored_since: start_date,
FilterKeys.authored_until: start_date + timedelta(days=7)
}
my_filters
In [ ]:
apiv1.map_stream_count(my_filters)
Filtering by stream¶
Ok, so there’s still a lot of Swift packets there. Let’s take a look at a sample of those, and see if we can break them up further. First, lets add another filter to limit our query to just Swift packets.
In [ ]:
my_filters[FilterKeys.stream] = 'nasa.gsfc.gcn/SWIFT'
my_filters
So now if we apply the filters to map_stream_count
, we only get back
one entry (the Swift stream):
In [ ]:
apiv1.map_stream_count(filters=my_filters)
Filters can be used across different query-endpoints¶
Not particularly helpful, but at least everything is working as
expected. Now, the neat thing about the voeventdb filters is that they
can be applied to any query-endpoint - we can just re-use the
filter-dictionary with the apiv1.list_ivorn
function to get back a
list of IVORNs:
In [ ]:
swift_ivorns = apiv1.list_ivorn(filters=my_filters)
print("Retrieved",len(swift_ivorns),"IVORNs")
#Show just the first 10
swift_ivorns[:10]
That’s a long list, but there’s clearly a pattern to how the Swift IVORNs are formatted. We’ll use a little Python trickery (cf set, str.rsplit) to chop off the trailing ID numbers and sort them into sub-categories:
In [ ]:
swift_categories = set(ivorn.rsplit('_',1)[0] for ivorn in swift_ivorns)
swift_categories
Now we’re getting somewhere! We can clearly see the subcategories of Swift packets - BAT alerts, XRT positions, UVOT followup, etc.
Filtering by IVORN substring¶
We can use this knowledge to refine our filters, by filtering on a
substring of the IVORN, using the ivorn_contains
filter. For
example, we might want to filter to just those IVORNs containing XRT
positions (note this filter is case-sensitive):
In [ ]:
my_filters[FilterKeys.ivorn_contains] = 'XRT_Pos'
my_filters
In [ ]:
xrt_pos_ivorns = apiv1.list_ivorn(filters=my_filters)
print("Retrieved",len(xrt_pos_ivorns),"IVORNs")
xrt_pos_ivorns
As in tutorial 1, we can inspect the details of any given packet using
the packet_synopsis
endpoint - we’ll take a look at the first one.
This packet makes a good example, as it includes details of the event
co-ordinates and timestamp, and also references an earlier VOEvent:
In [ ]:
synopsis_dict = apiv1.packet_synopsis(xrt_pos_ivorns[0])
synopsis_dict
Ready-made ‘helper’ classes for parsing output¶
Nested dictionaries can be kind of a pain to work with. If you want, you
can use voeventdb.remote’s
`Synopsis
<http://voeventdbremote.readthedocs.org/en/latest/reference/index.html#voeventdb.remote.helpers.Synopsis>`__
‘helper’ class to parse this into an easy-to use object.
In [ ]:
from voeventdb.remote.helpers import Synopsis
In [ ]:
xrt_synopsis = Synopsis(synopsis_dict)
# Prints with nicer formatting, ordering of values:
print(xrt_synopsis)
Now we can easily access the values (with the ever-handy IPython autocompletion):
In [ ]:
xrt_synopsis.author_ivorn
In [ ]:
xrt_synopsis.references
One of the Synopsis
class attributes is a list called
sky_events
. Each entry is a
`SkyEvent
<http://voeventdbremote.readthedocs.org/en/latest/reference/index.html#voeventdb.remote.helpers.SkyEvent>`__
class, which reprents a very basic set of information about an observed
event: - estimated position, - error circle on the estimated position, -
timestamp of the observed event.
The position coordinates and error-circle are represented by
`astropy.coordinates
<http://astropy.readthedocs.org/en/stable/coordinates/index.html>`__
classes, which come with a bunch of features related to formatting,
distance calculations, frame-of-reference transformations, etc.
In [ ]:
xrt_synopsis.sky_events
In [ ]:
# List of 1, in this case. Grab the first (and only) element:
sky_event = xrt_synopsis.sky_events[0]
In [ ]:
print(type(sky_event.position))
sky_event.position
In [ ]:
print(type(sky_event.position_error))
sky_event.position_error.deg
Astropy coordinates come with all the usual weird and wonderful astronomical formatting options, see the astropy docs for details:
In [ ]:
print(sky_event.position.ra.deg)
print(sky_event.position.ra.hms)
Advanced usage: specifying multiple values for the same filter¶
Before we move on, it’s worth mentioning that some filters can take on multiple values. This is specified by defining the filter-value as a list - for example, to return all VOEvents with a role of ‘observation’ or ‘utility’ we can use the following:
In [ ]:
my_filters = {apiv1.FilterKeys.role: ['observation','utility']}
apiv1.map_stream_count(my_filters)
How does this work? Well, we can think of each entry in the list
defining a separate filter. For the role
value, these filters are
combined in the logical ‘OR’ sense, so we get back combined counts for
both ‘observation’ and ‘utility’ packets. You can check whether a filter
accepts multiple values, and if they are combined via logical ‘OR’ or
‘AND’, by checking the filter-definitions
page
and looking for the combinator
attribute.
Coming next …¶
We’ve seen how to narrow our search, locate packets of interest, and use helper-classes to easily access packet details. In tutorials 3 & 4, we’ll cover different ways of finding related VOEvents.