Searching Skylark
Variable metadata searching is at the heart of Skylark. The system has been designed to offer a range of searches, from simple keyword to more detailed combinations.
Simple Searching
Instant Keyword Search
After logging in, the main menu bar contains a grey ‘search’ box, which is retained throughout Skylark. This will perform a simple keyword search at any time during your Skylark session. Keyword searches will return matches from variable names or labels, and are not case sensitive.
Keyword search parameters
The keyword search function has been updated in the latest Skylark release to be more intuitive and flexible. The default setting for a multi-term search is now an implied OR search. e.g. if you make a standard keyword search for blood pressure the results will be any variable with the word blood OR the word pressure in the variable name or label.
You can use the modifiers - + and “ next to one or more search terms to change the focus of a keyword search.
Modifier | Use | Example |
---|---|---|
+ | Inclusion | +blood +pressure will find results with blood and pressure anywhere in the variable name or description |
- | Exclusion | blood –pressure will find results including blood but excluding pressure |
“ | Exact match | “blood pressure” will find results with both words next to each other |
Combinations of the above multipliers can also be used, e.g. “blood pressure” –diastolic will exclude all variables with diastolic in the description, from the “blood pressure” search results.
N.B. Please do not specify Boolean search terms e.g. AND, OR, NOT in a keyword search, as they will give many unexpected results!
Main Search Menu
The options on the main Skylark search menu can be seen below:
Simple search options are as follows:
- Variable name – search for one or more specific variables by their names. Must be the exact name(s), no fuzzy searching. Multiple variable names must be separated by a single space
- Keyword – as above, search variable names and labels for keywords
- Year – a dropdown box to select the year of data you are interested in (some years containing a lot of variables have been split into questionnaires)
- Category – NSHD data are categorised into 27 broad groups, from ‘wellbeing’ to ‘education’. Select from a dropdown menu. Some categories give a large number of results
- Topic - the data have been broken down further, into topics. Select from a dropdown menu. The topic guides in the metadata repository include variable lists and 'standard topic baskets' to view and save
- Library1) – the data consist of around 400 library files, roughly grouped on topic. Select from a dropdown menu
Keyword is the most common search, however if your term is broad then this will result in a large number of hits and can slow the process.
Selecting a keyword search from the menu gives a Soundex option. This is a broader, phonetic-based search, which will give results containing keywords that sound similar to your search term(s), and results in many more hits.
Topic searching is a new feature - selecting from a drop-down list of the main topics of data collected in NSHD will load a pre-selected set of the most commonly-used variables on that topic. You can add these variables to your basket in bulk, and then add or edit the contents (if desired) prior to saving.
However, please note that the standard topic baskets are not an exhaustive list of everything we hold on a topic, or necessarily contain the variables that are best for you.
They are generally summary variables, and have been chosen based on previous usage and popularity.
Library searching can be another very useful ‘topic’ search, for example the library ‘Alcohol14’ contains all the variables on alcohol use collected in 2014. However, only the more recent libraries have contextual names, with many of the older data being housed in libraries called ‘B01’ or ‘Y79’, which give no idea of their contents. However, every library has a brief description listed in the dropdown menu.
A guide to the older NSHD data libraries and their contents is also available.
- However, please note that not all the libraries listed in the guide are available on Skylark. Sensitive or possibly disclosive libraries may have been removed to protect the study and the identity of its participants.
- Also, many of the libraries relate to various sub-studies that have taken place over the years. In some cases, the variables will contain only a few hundred cases, rather than the full sample. Please use the frequency tables for individual variables present on Skylark to check the details, prior to adding to your basket.
- The Insight 46 neuroscience sub-study libraries all begin with 'I46…'
- For a quick way to view all available variables from the Insight 46 neuroscience sub-study, make a Year search and select '2016'
Combination Searching
A variety of compound Boolean AND searches can also be undertaken on Skylark:
- Keyword and year
- Keyword and category
- Year and category
- Keyword, year and category
Soundex can also be applied to all combination searches containing keyword.
Restricted Variables
Due to restrictions placed on us by NHS Digital, we are unable to share certain restricted data outside of the Unit. These variables will not appear in any search results. Data restricted in this way includes:
- Mortality data
- Cancer registrations
- Hospital episodes (HES)
If you think your project may require access to these kinds of data, or other sensitive variables unavailable via Skylark due to a raised level of risk of reidentification to NSHD study members, please contact us. It may be possible for you to access these data from inside the Unit.
Search Results
Whatever kind of search you perform, the results table will always appear the same way, showing the number of hits and details of the variables.
The example below shows the results for a keyword search on ‘blood pressure’
Results are listed alphabetically by variable name.
Here, you have the option to add one or more of these variables into your basket
Click on a variable to view more detailed metadata, including a frequency distribution and crosstab by sex. Many of the more recent datasets have extra documentation linked, including a data cleaning guide to that topic and references to published papers.