How to Search for Bios on Github

This article originally was posted on BooleanStrings.com

Just like we can’t search for LinkedIn headlines within LinkedIn or by X-Raying, we can’t search for Github Bios – either within Github or by X-Raying. However, we can search for LinkedIn headlines with Custom Search Engines (CSEs). It turns out that we similarly can search for Github Bios with CSEs!

We will be searching using Github X-Ray CSE. I will start off providing sample search strings to look within Bios, then, will give some explanations.

Here you go. “GitHub Bio contains”:

You can change the arguments, add keywords, and combine with other Google’s and Custom Search Engine operators specific to Github. As you may have noticed, you can use the asterisk * for ANDs and comma , for ORs in the special operators.

You Can Stop Reading Now and Go Enjoy the Searches

But wait, I also want to tell you that our tool Social List uses CSE operators in the background, and you won’t need to write any operators – just enter your terms and collect results. Here is what a search looks like:

github bios

Check it out if you haven’t.

 

Now, if you are wondering how I came up with the horrible-looking operator more:p:metatags-og_description: (and what is behind the search algorithm in Social List), read on.

CSE – Special Advanced Syntax

Special CSE operators depend on the website and structure of its pages. More specifically, operators depend on what Schema.org, Microformats, and other objects and values are (invisibly) included in the pages’ source code.

The general CSE search operator format is this:

more:pagemap:<data-field-name>:<data-value>:<value>

– where data-field-name is an object like Person, data-value is a value, such as “org” (i .e. organization, a Person’s employer), and value is a string like “IBM”– finds pages containing the object Person with a matching “org” value.

Alternative syntax uses just p instead of pagemap:

more:p:<data-field-name>:<data-value>:<value>

Google.com doesn’t “understand” the more:… search syntax, but any Google Custom Search Engine does.

Objects and Values to Query

Objects (like Person of schema.org) and values (like employer=”IBM”) are invisibly included in web pages’ source code, in its part called “PageMap”. The big deal is – you can search within objects and their values using CSE operators. PageMap includes data following a variety of standards: Schema.org, Microformats, and others, and also a part called “Metatags”.

In our particular case, a GitHub Bio is stored in Metatags under the tag “og:description” (and is also duplicated under “twitter:description”). I found it by examining the JSON output from a CSE API call:

“metatags”: [
{
“viewport”: “width=device-width”,
“fb:app_id”: “1401488693436528”,
“twitter:image:src”: “https://avatars1.githubusercontent.com/u/447033?s=400&v=4”,
“twitter:site”: “@github”,
“twitter:card”: “summary”,
“twitter:title”: “garris – Overview”,
“twitter:description”: “Works at LinkedIn. Lives in Berkeley. Likes a nice hike. – garris”,
“og:image”: “https://avatars1.githubusercontent.com/u/447033?s=400&v=4”,
“og:site_name”: “GitHub”,
“og:type”: “profile”,
“og:title”: “garris – Overview”,
“og:url”: “https://github.com/garris”,
“og:description”: “Works at LinkedIn. Lives in Berkeley. Likes a nice hike. – garris”,
“profile:username”: “garris”,
“pjax-timeout”: “1000”,
“request-id”: “895E:41D8:7B30:E042:5D3F23D9”,
“octolytics-host”: “collector.githubapp.com”,
“octolytics-app-id”: “github”,
“octolytics-event-url”: “https://collector.githubapp.com/github-external/browser_event”,
“octolytics-dimension-request_id”: “895E:41D8:7B30:E042:5D3F23D9”,
“octolytics-dimension-region_edge”: “iad”,
“octolytics-dimension-region_render”: “iad”,
“analytics-location”: “/\u003cuser-name\u003e”,
“google-analytics”: “UA-3769691-2”,
“dimension1”: “Logged Out”,
“hostname”: “github.com”,
“expected-hostname”: “github.com”,
“js-proxy-site-detection-payload”: “MTUyMTUyNGE4ODJhNTRkMmFkZGU3NjFlOTA5ZTllNTNmZDg1NzZmN2UwZTM1YzhlOWQ5YjAxNGEyZTBhMDk0Ynx7InJlbW90ZV9hZGRyZXNzIjoiNjYuMjQ5LjY2LjIxNiIsInJlcXVlc3RfaWQiOiI4OTVFOjQxRDg6N0IzMDpFMDQyOjVEM0YyM0Q5IiwidGltZXN0YW1wIjoxNTY0NDE5MDM1LCJob3N0IjoiZ2l0aHViLmNvbSJ9”,
“enabled-features”: “MARKETPLACE_FEATURED_BLOG_POSTS,MARKETPLACE_INVOICED_BILLING,MARKETPLACE_SOCIAL_PROOF_CUSTOMERS,MARKETPLACE_TRENDING_SOCIAL_PROOF,MARKETPLACE_RECOMMENDATIONS,MARKETPLACE_PENDING_INSTALLATIONS”,
“html-safe-nonce”: “1ef7c04a79f7c74d7ed950ed690d277292296f65”,
“browser-stats-url”: “https://api.github.com/_private/browser/stats”,
“browser-errors-url”: “https://api.github.com/_private/browser/errors”,
“theme-color”: “#1e2327”
}]

One last step and you will catch up with me on the subject. I am going to tell you how I obtained the JSON sample pasted above.

Running CSE API Calls

The APIs query CSEs from software code. It’s also possible to run an API call from your browser address bar.

Using the APIs requires obtaining a KEY (long coded string) from Google, available here. Input for an API call is a KEY, a CSE ID (a value you can copy from the Control Panel), a query string, and (optional) parameters.

You can run an API query from your browser in the following fashion:

https://www.googleapis.com/customsearch/v1?key=KEY&cx=CSEID&q=a

– it will look like this:

github bios

 

An API call produces a JSON-formatted output page that you can browse to figure out the operator formats.

While you can examine a page’s structure with various tools (including CSEs themselves), these API JSON outputs provide “the” most accurate information for assembling CSE operators.

Final Word

Querying structured info on the web is incredibly powerful. It may seem “too technical”, but that is mostly due to odd-looking strings of parameters that create that impression. (But you don’t need to “read” parameters, you just need to copy and paste.) Maybe one day, Google (or someone) will attach a friendly UI to Google CSEs’ structured web search. In the meantime, follow the links to search in Github Bios and definitely try Social List.

 

For more articles like this, visit BooleanStrings.com!


Authors
Irina Shamaeva

Irina Shamaeva is a recognized leader in Sourcing, Social Recruiting, and Internet Research. She is Partner and Chief Sourcer at Brain Gain Recruiting, an executive search firm. In addition to sourcing for her agency, Irina takes on Sourcing /Name Generation/Internet Research projects across numerous industries and geographies – which she loves doing!