AmazingHiring - How to Source with AWS?

How to Source with AWS?

Here is the guidebook by Jiri Herodek how to use Amazon Web Services and its machine learning module for sourcing and which pros and cons it has. Before we dive into the process let us explain a couple of notions you need to be familiar with before implementing the whole process into your sourcing strategy. 

03.04.2019
205
4 min to read

Amazon Web Services (AWS) is an on-demand cloud platform providing services to individuals and companies as well. AWS comprises various modules like Application Integration, Analytics, AR & VR, Cost Management, Blockchain. Machine Learning, Robotics, Mobile, etc. You can check more information here. If you want to know more I suggest you go through the free online training platform to be more familiar with the technology and basic concepts. We use only Machine learning and S3 storage in our sourcing process. You will find them in the administration of your AWS account. The advantage of AWS is that you pay as you go. I pay usually 0.90 USD per month. It depends on the number of predictions (activities) that you do. Predictions can be real-time (synchronous) and batch (asynchronous). I always use real-time predictions since you can see the output immediately and it’s more appropriate for a smaller data sample and it’s also cheaper.

The most important thing that you have to remember is that machine learning models are looking for patterns in data that you upload to the system. You can use this model for the compatibility process in terms of the skills, location or photo.

The output is JSON file that you import to your Google spreadsheet. The result is information about the candidates and percentage fit rate.

What you will need for the whole process:

  1. AWS Account – it’s for free (you pay only for made predictions) so unless it’s for real-time predictions.
  2. Google Account – You will use Google Spreadsheets for doing your searches and for importing the results into it.
  3. Xpath Helper – is a chrome extension that you can download here The chrome extension extracts, edits and evaluates XPath queries on any webpage.
  4. Blockspring – is an application that you can use in your spreadsheet for importing data from various sources. Let’s say it is an API connector. There is also a chrome extension that you can use for building own apps.

The process includes the following parts and reflects the stages of the ML (machine learning) model in the setting of the AWS console:

  1. Data Preparation (Google Spreadsheet with data about the candidates).
  2. Training Data Source (uploading a CSV file with desired matching criteria to your S3 storage).
  3. Create a Machine Learning Model (ML) – you have to choose the right type of predictions, file type, etc.
  4. Review the ML Model – you just check the predictive performance bar which should be around 90-100%.
  5. Set a score Threshold – here you usually don’t change anything because it would break the whole model. The threshold indicates how many positive and negative errors might occur.
  6. Use the ML Model to generate Predictions.

Clean up data and import JSON to the spreadsheet – here you should check if Google spreadsheet allows you importing JSON from any link and launching the code in the spreadsheet. You can download some open source code from here.

Now let’s dive into the process and how the key stages look like:

1) Boolean search

The most important is the data preparation phase since you need to have all data in one tab and candidates you want to compare with either skill, location or photo in the second tab.

Start with creating a boolean generator and use this one as a solid foundation. Except for this boolean generator you have to add to the spreadsheet also some other factors like a search engine that you want to use and website from which you want to bring your data (Linkedin, Angellist, etc).  You have to put to the list also the field website factor web crawlers for Google or Bing and then combine it with data obtained from x-ray searches for Linkedin or Angellist etc. and import them to your google spreadsheet via formulas like IMPORT etc. You can find how to do it here .

Example: 

=CONCATENATE (IF(B14="BING";"https://www.bing.com/search?q=";"https://www.google.com/search?num=50&safe=off&q=");IF( B15="LINKEDIN";"site%3Alinkedin.com%2Fin+"; "site%3Angel.co+"))

When it comes to the searches 1-16 we just combine the proper website factor, for example, Google+Linkedin with proper skills, city and country and we use function CONCATENATE for this so for search 1 it looks like this:

=CONCATENATE(B17;B6;" AND ";B7; " AND "; B8; " AND "; B10)

2) Candidates

In this tab, we have only a reference on the previous tab our boolean generator and our searcher 1-16. There is inserted a simple condition that looks like this:

=IFS(A3="Search 1";'Boolean search'!B20;A3="Search 2";'Boolean search'!B21;A3="Search 3";'Boolean search'!B22;A3="Search 4";'Boolean search'!B23;A3="Search 5";'Boolean search'!B24;A3="Search 6";'Boolean search'!B25;A3="Search 7";'Boolean search'!B26;A3="Search 8";'Boolean search'!B27;A3="Search 9";'Boolean search'!B28;A3="Search 10";'Boolean search'!B29;A3="Search 11";'Boolean search'!B30;A3="Search 12";'Boolean search'!B31;A3="Search 13";'Boolean search'!B32;A3="Search 13";'Boolean search'!B32;A3="Search 14";'Boolean search'!B33;A3="Search 15";'Boolean search'!B34;A3="Search 16";'Boolean search'!B35)

Whenever you change you search the crawler will bring candidates from a given search engine and desired social network or source in general and candidates will pop up.

3) Matching criteria

Finally, you need to have the third tab with that matching criteria. Use skills or location. You can use assign the value 100% or 50%. Upload this data to AWS as a CSV file and create your machine learning model.

4) Final step

Create a new tab in your spreadsheet where will you import your results. Check your settings that it is allowed to import JSON file from AWS. This repository on Github shows you how to import JSON functionalities.  

This method is useful to test your sourcing hypothesis in terms of the available talent pool and market size. The system will automatically source for you and the whole process is scalable and easy to use. The only drawback is the limit of matching candidates with a location. And it works best for IT roles.  

Sourcing with Amazon Web Services

The whole idea is very simple. You need to have extracted data about your candidates (names, Linkedin profiles, titles, etc) in your Google Spreadsheet. For this data mining process, you can use automated scripts that will extract data from Google or Bing in XML format and import them to your Google spreadsheet. The original Spreadsheet can look like this one.

Afterward, you need to create some automated scripts in your google spreadsheet to manipulate with data in your Google spreadsheet whenever you change the source of data (Google or Bing) and social network (Linkedin, Angellist, Github, etc.). The whole sourcing process is very quick and smooth and you can have an instant overview about the talent pool in various countries and the matching score and the script will bring you data about the candidates on Linkedin (Name, Linkedin URL, etc.) for example on Google and Bing. So that you can distinguish the difference which data are indexed by Google and which are indexed by Bing and you have a more comprehensive overview of the market.

Create a machine learning model in your AWS account that will be matching your candidates with given criteria. AWS allows you to match candidates with location and skills.

For instance, we want to know the percentage rate fit between a particular candidate and the role. Let’s say we have 3 skills (Javascript, React and Angular) that are crucial for our role and javascript is a must-have skill (100%), React is important for 50% and angular is nice to have (25%). We assign a percentage rate to each skill based on the importance. When it comes to the location download the list of the cities (counties).  

 

Ready to start sourcing IT candidates?

AmazingHiring is an AI-based solution for tech recruiters to source passive IT candidates across the web. It aggregates IT profiles from 50+ networks like GitHub, Stackoverflow, Facebook, Kaggle, etc. using Boolean operators, and provide recruiters with candidates’ professional background, contacts, skills. 

AmazingHiring empowers recruiters to double their pipeline and improve their sourcing metrics. 

Other materials:

If you want to hire a truly professional designer, it makes no sense to post a vacancy and wait for applications. This is undoubtedly the case when a portfolio outweighs any CV. Therefore, start searching proactively on such websites as Dribbble and Behance.

Recently, we spoke about how to recruit a strong specialist after you’ve searched for them everywhere and with no success. If sourcing in social media and professional networking services has come to naught, it’s high time to install Slack – a popular team messaging application.

We’ve already told you how to find awesome developers on GitHub, and in this handbook, we’ll give you tips on sourcing best of the best in StackOverflow and Kaggle.