Django | Scrapy ( 크롤링 프레임워크 )

Back-End/Django

Django | Scrapy ( 크롤링 프레임워크 )

개발자티포 2022. 7. 6. 15:33

728x90

파이썬 웹 크롤러가 종류가 많다고 한다. 나는 그 중에 Scrapy를 선택했다.

자세한 내용은 아래 사이트에 들어가보면 볼 수 있다.

https://docs.scrapy.org/en/latest/#

Scrapy 2.6 documentation — Scrapy 2.6.1 documentation

docs.scrapy.org

1. 먼저 scrapy 를 설치한다.

$ pip install scrapy

2. 스크래피 프로젝트를 생성한다.

$ scrapy startproject scrapy_project

3. spiders 폴더 안에 파이썬 파일을 만들어준다.

scrapy_project/scrapy_project/spiders/example.py

import scrapy

class ExampleSpider(scrapy.Spider):
    name = 'example'
    start_urls = [
        'http://www.example.com/'
    ]

    def parse(self, response):
        url = response.url
        title = response.css('h1::text').get()
        print(f'URL is: {url}')
        print(f'Title is: {title}')

4. spider 폴더 경로로 이동하여 아래 명령어를 실행해보자.

$ scrapy runspider example.py

5. 결과를 확인한다.

6. 추가로 settings.py 파일에 아래 내용을 추가하면 좋다.

CONCURRENT_REQUESTS = 1    
DOWNLOAD_DELAY = 0.25  # 250ms 기다림 
HTTPCACHE_ENABLED = True  # cache 기능 사용

728x90