Here is the guidebook by Jiri Herodek on how to use Amazon Web Services and its machine learning module for sourcing and which pros and cons it has. Before we dive into the process let us explain a couple of notions you need to be familiar with before implementing the whole process into your sourcing strategy.
Amazon Web Services (AWS) is an on-demand cloud platform providing services to individuals and companies as well. AWS comprises various modules like Application Integration, Analytics, AR & VR, Cost Management, Blockchain. Machine Learning, Robotics, Mobile, etc. You can check more information here. If you want to know more I suggest you go through the free online training platform to be more familiar with the technology and basic concepts. We use only Machine learning and S3 storage in our sourcing process. You will find them in the administration of your AWS account. The advantage of AWS is that you pay as you go. I pay usually 0.90 USD per month. It depends on the number of predictions (activities) that you do. Predictions can be real-time (synchronous) and batch (asynchronous). I always use real-time predictions since you can see the output immediately and it’s more appropriate for a smaller data sample and it’s also cheaper.
The most important thing that you have to remember is that machine learning models are looking for patterns in data that you upload to the system. You can use this model for the compatibility process in terms of skills, location, or photo.
The output is a JSON file that you import to your Google spreadsheet. The result is information about the candidates and the percentage fit rate.
What you will need for the whole process:
- AWS Account – it’s for free (you pay only for made predictions) so unless it’s for real-time predictions.
- Google Account – You will use Google Spreadsheets for doing your searches and for importing the results into it.
- Xpath Helper – is a chrome extension that you can download here The chrome extension extracts, edits, and evaluates XPath queries on any webpage.
- Blockspring – is an application that you can use in your spreadsheet for importing data from various sources. Let’s say it is an API connector. There is also a chrome extension that you can use for building your own apps.
The process includes the following parts and reflects the stages of the ML (machine learning) model in the setting of the AWS console:
- Data Preparation (Google Spreadsheet with data about the candidates).
- Training Data Source (uploading a CSV file with desired matching criteria to your S3 storage).
- Create a Machine Learning Model (ML) – you have to choose the right type of predictions, file type, etc.
- Review the ML Model – you just check the predictive performance bar which should be around 90-100%.
- Set a score Threshold – here you usually don’t change anything because it would break the whole model. The threshold indicates how many positive and negative errors might occur.
- Use the ML Model to generate Predictions.
Clean up data and import JSON to the spreadsheet – here you should check if Google spreadsheet allows you to import JSON from any link and launching the code in the spreadsheet. You can download some open-source code from here.
Now let’s dive into the process and how the key stages look like:
1) Boolean search
The most important is the data preparation phase since you need to have all data in one tab and candidates you want to compare with either skill, location, or photo in the second tab.
Start with creating a boolean generator and use this one as a solid foundation. Except for this boolean generator you have to add to the spreadsheet also some other factors like a search engine that you want to use and website from which you want to bring your data (Linkedin, Angellist, etc). You have to put to the list also the field website factor web crawlers for Google or Bing and then combine it with data obtained from x-ray searches for Linkedin or Angellist etc. and import them to your google spreadsheet via formulas like IMPORT etc. You can find how to do it here.
=CONCATENATE (IF(B14="BING";"https://www.bing.com/search?q=";"https://www.google.com/search?num=50&safe=off&q=");IF( B15="LINKEDIN";"site%3Alinkedin.com%2Fin+"; "site%3Angel.co+"))
When it comes to the searches 1-16 we just combine the proper website factor, for example, Google+Linkedin with proper skills, city, and country and we use function CONCATENATE for this so for search 1 it looks like this:
=CONCATENATE(B17;B6;" AND ";B7; " AND "; B8; " AND "; B10)
In this tab, we have only a reference on the previous tab of our boolean generator and our searcher 1-16. There is inserted a simple condition that looks like this:
=IFS(A3="Search 1";'Boolean search'!B20;A3="Search 2";'Boolean search'!B21;A3="Search 3";'Boolean search'!B22;A3="Search 4";'Boolean search'!B23;A3="Search 5";'Boolean search'!B24;A3="Search 6";'Boolean search'!B25;A3="Search 7";'Boolean search'!B26;A3="Search 8";'Boolean search'!B27;A3="Search 9";'Boolean search'!B28;A3="Search 10";'Boolean search'!B29;A3="Search 11";'Boolean search'!B30;A3="Search 12";'Boolean search'!B31;A3="Search 13";'Boolean search'!B32;A3="Search 13";'Boolean search'!B32;A3="Search 14";'Boolean search'!B33;A3="Search 15";'Boolean search'!B34;A3="Search 16";'Boolean search'!B35)
Whenever you change your search the crawler will bring candidates from a given search engine and desired social network or source in general and candidates will pop up.
3) Matching criteria
Finally, you need to have the third tab with that matching criteria. Use skills or location. You can use assign the value 100% or 50%. Upload this data to AWS as a CSV file and create your machine learning model.
4) Final step
Create a new tab in your spreadsheet where will you import your results. Check your settings that it is allowed to import JSON files from AWS. This repository on Github shows you how to import JSON functionalities.
This method is useful to test your sourcing hypothesis in terms of the available talent pool and market size. The system will automatically source for you and the whole process is scalable and easy to use. The only drawback is the limit of matching candidates with a location. And it works best for IT roles.
Sourcing with Amazon Web Services
The whole idea is very simple. You need to have extracted data about your candidates (names, Linkedin profiles, titles, etc) in your Google Spreadsheet. For this data mining process, you can use automated scripts that will extract data from Google or Bing in XML format and import them to your Google spreadsheet. The original Spreadsheet can look like this one.
Afterward, you need to create some automated scripts in your google spreadsheet to manipulate data in your Google spreadsheet whenever you change the source of data (Google or Bing) and social network (Linkedin, Angellist, Github, etc.). The whole sourcing process is very quick and smooth and you can have an instant overview about the talent pool in various countries and the matching score and the script will bring you data about the candidates on Linkedin (Name, Linkedin URL, etc.) for example on Google and Bing. So that you can distinguish the difference between which data are indexed by Google and which are indexed by Bing and you have a more comprehensive overview of the market.
Create a machine learning model in your AWS account that will be matching your candidates with the given criteria. AWS allows you to match candidates with location and skills.
Ready to start sourcing IT candidates?
AmazingHiring is an AI-based solution for tech recruiters to source passive IT candidates across the web. It aggregates IT profiles from 50+ networks like GitHub, Stackoverflow, Facebook, Kaggle, etc. using Boolean operators, and provide recruiters with candidates’ professional background, contacts, skills.
AmazingHiring empowers recruiters to double their pipeline and improve their sourcing metrics.
Join the AmazingHiring Sourcing Community
Recruiters tend to spend a lot of time on monotonous tasks like aligning calls, sourcing candidates’ emails and phone numbers. However, these days many online tools can make a recruiter’s life much easier. Here are the top FREE solutions every recruiter should know about in 2022! Best free sourcing solutions Using top sourcing tools helps […]
It’s no secret that candidates may not be completely truthful in their CVs or during the interviews. Some may unintentionally lie to make a better impression during the interview. But in the battle to stand out among the competitors, some candidates take it a bit far and fudge the truth. The most common things candidates […]
In just a few years the world has changed so much that it took us a huge effort to accept the new reality and adapt to it. Putting aside everything we have been bearing, it merits saying that things are slowly getting better and to some extent, we have managed to adjust. Yet, the world […]