-
Scrap WeWorkRemotelyProject using python/Jobs scrapper 2020. 12. 21. 11:51
scrapperWWR.py
import requests from bs4 import BeautifulSoup def extract_job(html): # title, company, location, link job_info_link = html.find_all('a') if len(job_info_link) > 1: job_info_link = job_info_link[1] else: job_info_link = job_info_link[0] link = f"https://weworkremotely.com/{job_info_link['href']}" job_info = job_info_link.find_all("span") company = job_info[0].get_text() title = job_info[1].get_text() location = job_info[5].get_text() return { "title" : title, "link" : link, "company" : company, "location" : location } def extract_jobs(url): jobs=[] result = requests.get(url) soup = BeautifulSoup(result.text, "html.parser") results = soup.find("section",{"class":"jobs"}).find("ul").find_all("li",{"class":"feature"}) for result in results: job = extract_job(result) print(job) jobs.append(job) return jobs def get_WWRJobs(word): url = f"https://weworkremotely.com/remote-jobs/search?term={word}" jobs = extract_jobs(url) return jobs
SO와 매커니즘은 똑같다.
scrapperJobs.py
from scrapperSO import get_SOJobs from scrapperWWR import get_WWRJobs SO = get_SOJobs("python") WWR = get_WWRJobs("python") jobs = SO + WWR
scrapperJobs에서 일자리 사이트에서 모든 일자리를 구할 것이다. SO외에도 remoteok.io와 weworkremotely에서도 일자리를 구할 것이다.
참고 자료
소스 코드
github.com/zpskek/web_scraper-v2/commit/d6a447b6edc4fa8e484cee11ab9096a5cf5239b3
'Project using python > Jobs scrapper' 카테고리의 다른 글
home.html with Flask framework (0) 2020.12.21 Scrap remote.com (0) 2020.12.21 Extract jobs from Stack Overflow (0) 2020.12.21 get_last_page of Stack Overflow (0) 2020.12.21 Job scrapper Intro (0) 2020.12.21