Project using python
-
home.html with Flask frameworkProject using python/Jobs scrapper 2020. 12. 21. 13:16
Flask Introduction Flask는 쉽고 빠르게 web server를 만들 수 있는 python famework다. Installation pipenv install Flask Run flask // main.py from flask import Flask, render_template, redirect app = Flask("Job Scrapper", template_folder="./src/templates") @app.route('/') def index(): try: return render_template("home.html") except IOError: return redirect("/") app.run(host="127.0.0.1") app = Flask()를 통해서 Flask ..
-
Scrap remote.comProject using python/Jobs scrapper 2020. 12. 21. 11:53
scrapperRemote.py import requests from bs4 import BeautifulSoup headers = { "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36" } def extract_job(html): # title, company, location, link tds = html.find_all("td") link = tds[0].find("a")["href"] if not link: link = "" title = tds[1].find("h2").string if not title: ..
-
Scrap WeWorkRemotelyProject using python/Jobs scrapper 2020. 12. 21. 11:51
scrapperWWR.py import requests from bs4 import BeautifulSoup def extract_job(html): # title, company, location, link job_info_link = html.find_all('a') if len(job_info_link) > 1: job_info_link = job_info_link[1] else: job_info_link = job_info_link[0] link = f"https://weworkremotely.com/{job_info_link['href']}" job_info = job_info_link.find_all("span") company = job_info[0].get_text() title = job..
-
Extract jobs from Stack OverflowProject using python/Jobs scrapper 2020. 12. 21. 11:49
scrapperSO.py import requests from bs4 import BeautifulSoup def extract_job(html): # title, company, location, link title_link = html.find('h2').find('a',{'class':'s-link'}) title = title_link['title'] link = f"https://stackoverflow.com/jobs/{title_link['href']}" company, location = html.find('h3', {"class":"fs-body1"}).find_all('span', recursive=False) return { "title" : title, "link" : link, "..
-
get_last_page of Stack OverflowProject using python/Jobs scrapper 2020. 12. 21. 10:51
scrapperSO.py import requests from bs4 import BeautifulSoup def get_last_page(url): result = requests.get(url) soup = BeautifulSoup(result.text, "html.parser") pages = soup.find("div", {"class": "s-pagination"}).find_all('a') last_page = pages[-2].get_text(strip=True) return int(last_page) def get_SOJobs(word): url = f"https://stackoverflow.com/jobs?q={word}" last_page = get_last_page(url) print..
-
Python BeautifulSoupProject using python/Jobs scrapper 2020. 12. 21. 10:27
Definition Beautiful Soup은 HTML 및 XML 파일에서 데이터를 추출하기 위한 Python 라이브러리다. Installation pipenv install beautifulsoup4 Usage import requests from bs4 import BeautifulSoup result = requests.get(url) soup = BeautifulSoup(result.text, "html.parser") Method find pages_container = soup.find("div", {"class": "s-pagination"}) BeautifulSoup로 추출한 html에서 tag가 "div"이고 class가 "s-pagination"인 html을 추출한다. 이에 해당하는 ..
-
Python requestsProject using python/Jobs scrapper 2020. 12. 21. 10:00
Definition Python에서 사용되는 Http library다. Installation pipenv install requests Methods requests.get import requests url = 'https://api.github.com/some/endpoint' r = requests.get(url) payload = {'key1': 'value1', 'key2': 'value2'} r = requests.get('https://httpbin.org/get', params=payload) headers = {'user-agent': 'my-app/0.0.1'} r = requests.get(url, headers=headers) requests.get()에 준 url의 page를 가..