skylark:search

This is an old revision of the document!



Variable metadata searching is at the heart of Skylark. The system has been designed to offer a range of searches, from simple keyword to more detailed combinations.


Simple Searching

Instant Keyword Search

After logging in, the main menu bar contains a grey ‘search’ box, which is retained throughout Skylark. This will perform a simple keyword search at any time during your Skylark session. Keyword searches will return matches from variable names or labels, and are not case sensitive.


Keyword search parameters

The keyword search function has been updated in the latest Skylark release to be more intuitive and flexible. The default setting for a multi-term search is now an implied OR search. e.g. if you make a standard keyword search for blood pressure the results will be any variable with the word blood OR the word pressure in the variable name or label

You can use the modifiers - + and “ next to one or more search terms to change the focus of a keyword search.

Modifier Use Example
+ Inclusion +blood +pressure will find results with blood and pressure anywhere in the variable name or description
- Exclusion blood –pressure will find results including blood but excluding pressure
Exact match “blood pressure” will find results with both words next to each other

Combinations of the above multipliers can also be used, e.g. “blood pressure” –diastolic will exclude all variables with diastolic in the description, from the “blood pressure” search results.

N.B. Please do not specify Boolean search terms e.g. AND, OR, NOT in a keyword search, as they will give many unexpected results!


Main Search Menu

The options on the main Skylark search menu can be seen below:


Skylark search options

Simple search options are as follows:

  • Variable name – search for one or more specific variables by their names. Must be the exact name(s), no fuzzy searching. Multiple variable names must be separated by a single space
  • Keyword – as above, search variable names and labels for keywords
  • Year – a dropdown box to select the year of data you are interested in (some years containing a lot of variables have been split into questionnaires)
  • Category – NSHD data are categorised into 27 broad groups, from ‘wellbeing’ to ‘education’. Select from a dropdown menu. Some categories give a large number of results
  • Topic - the data have been broken down further, into topics. Select from a dropdown menu. The topic guides in the metadata repository include variable lists and 'standard topic baskets' to view and save
  • Library1) – the data consist of around 400 library files, roughly grouped on topic. Select from a dropdown menu


Keyword is the most common search, however if your term is broad then this will result in a large number of hits and can slow the process. Selecting a keyword search from the menu gives a Soundex option. This is a broader, phonetic-based search, which will give results containing keywords that sound similar to your search term(s), and results in many more hits.


Topic searching is a new feature - selecting from a drop-down list of the main topics of data collected in NSHD will load a pre-selected set of the most commonly-used variables on that topic. You can add these variables to your basket in bulk, and then add or edit the contents (if desired) prior to saving. However, please note that the standard topic baskets are not an exhaustive list of everything we hold on a topic, or necessarily contain the variables that are best for you. They are generally summary variables, and have been chosen based on previous usage and popularity.


Library searching can be another very useful ‘topic’ search, for example the library ‘Alcohol14’ contains all the variables on alcohol use collected in 2014. However, only the more recent libraries have contextual names, with many of the older data being housed in libraries called ‘B01’ or ‘Y79’, which give no idea of their contents.

A guide to the NSHD data libraries and their contents is available.

  • However, please note that not all the libraries listed in the guide are available on Skylark. Sensitive or possibly disclosive libraries may have been removed to protect the study and the identity of its participants.
  • Also, many of the libraries relate to various sub-studies that have taken place over the years. In some cases, the variables will contain only a few hundred cases, rather than the full sample. Please use the frequency tables for individual variables present on Skylark to check the details, prior to adding to your basket.



Combination Searching

A variety of compound Boolean AND searches can also be undertaken on Skylark:

  • Keyword and year
  • Keyword and category
  • Year and category
  • Keyword, year and category


Soundex can also be applied to all combination searches containing keyword.



Restricted Variables

Due to restrictions placed on us by NHS Digital, we are unable to share certain restricted data outside of the Unit. These variables will not appear in any search results. Data restricted in this way includes:

  • Mortality data
  • Cancer registrations
  • Hospital episodes (HES)

If you think your project may require access to these kinds of data, or other sensitive variables unavailable via Skylark due to a raised level of risk of reidentification to NSHD study members, please contact us. It may be possible for you to access these data from inside the Unit.


Search Results

Whatever kind of search you perform, the results table will always appear the same way, showing the number of hits and details of the variables.


The example below shows the results for a keyword search on ‘blood pressure’

Blood Pressure search results

Results are listed alphabetically by variable name.


Here, you have the option to add one or more of these variables into your basket



Click on a variable to view more detailed metadata, including a frequency distribution and crosstab by sex. Many of the more recent datasets have extra documentation linked, including a data cleaning guide to that topic and references to published papers.


variable metadata



1)
Libraries are also known as library files, card numbers, or datasets - these terms may be used interchangeably
  • skylark/search.1552665719.txt.gz
  • Last modified: 5 years ago
  • by adammoore