'Language/python'에 해당하는 글 65건

2019.06.17 [Utility] URL 인코딩된 파일이름 URL 디코드
2018.05.01 [python] BeautifulSoup4 vs Scrapy
2018.04.26 [python] map object
2018.04.24 [python] for문과 yield문, return문
2018.04.12 [python] elasticsearch-dsl scan / index / doc_type
2018.04.12 [python] logging handler class 소개
2018.04.11 [python] logging process/order/cycle/how-to-work
2018.04.11 [python] got multiple values for keyword argument 'query' error
2018.04.11 [python] elasticsearch-dsl DocType.save()
2018.04.11 [python] u'text' - prefix "u"의 의미
2018.04.07 [python] vs ipython, import문
2018.04.07 [elasticsearch] elasticsearch-dsl get mapping from DocType
2018.04.06 [python] OSError: raw write error (during scrapy crawling)
2018.04.06 [python] SQLAlchemy
2018.04.06 [python, Elasticsearch] python elsticsearch-dsl library 코드 분석
2018.04.06 [python] pycharm interpreter console error
2018.04.06 [python] pyreadline 패키지
2018.04.03 [python] ImportError: No module named 'win32api' 1
2018.04.03 [python] Poweshell virtualenv activate error 해결방법
2018.04.03 [python] pip install whl file
2018.04.03 [python] SQLAlchemy
2018.03.30 [python] yield, generator, coroutine
2018.03.30 [python] elasticsearch-py max_tries
2018.03.29 [python] 유용한 scrapy command
2018.03.27 [python] scrapy concurrent_requests
2018.03.26 [python] win32api ImportError
2018.03.23 [python] windows pip Twisted install error
2018.03.23 [python] scrapy with form-data
2018.03.16 python scrapy 한글 인코딩
2018.01.15 [python] multiprocessing

[Utility] URL 인코딩된 파일이름 URL 디코드

Language/python 2019. 6. 17. 23:09

share this post

인터넷에서 파일을 다운로드 받을 때, 파일이름이 URL decode 되지 않은채로 다운로드되는 경우가 있다

파일 사용 자체에는 문제 없지만, 파일이름을 알아보기 힘들다는 문제점이 있다

그런 파일들이 많을 때는 더 문제다

그래서 특정 디렉토리 내의 파일들의 이름을 url-decode하는 코드를 작성해봤다.

실행에는 python3가 필요하다

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
"""
Filename : urldecode.py
Python version : 3
Usage example : python urldecode.py C:\\my_dir
"""
import os
import argparse
from pathlib import Path
from urllib.parse import unquote
 
parser = argparse.ArgumentParser(description='특정 디렉토리의 url-encoded 파일이름들을 decode하는 프로그램')
parser.add_argument('dirpath', type=str, help='url-encoded 파일들이 위치한 디렉토리 경로')
args = parser.parse_args()
 
dirpath = Path(args.dirpath)
 
filenames = [x for x in dirpath.iterdir() if x.is_file()]
for x in filenames:
    print(x)
yn = input('rename all? (Y/n) : ').lower()
if yn != 'y':
    exit(0)
 
for old in filenames:
    new = Path(unquote(old.name))
    if new == old:
        continue
 
    print('Renamed {old} -> {new}'.format(old=old, new=new))
    old.rename(new)
 
Colored by Color Scripter
cs

저작자표시 (새창열림)

'Language > python' 카테고리의 다른 글

[python] BeautifulSoup4 vs Scrapy (0)	2018.05.01
[python] map object (0)	2018.04.26
[python] for문과 yield문, return문 (0)	2018.04.24
[python] elasticsearch-dsl scan / index / doc_type (0)	2018.04.12
[python] logging handler class 소개 (0)	2018.04.12

WRITTEN BY

: hojongs
블로그 옮겼습니다 https://hojongs.github.io/

[python] BeautifulSoup4 vs Scrapy

Language/python 2018. 5. 1. 20:50

share this post

beautifulsoup4은 파싱 라이브러리

scrapy는 크롤링 프레임워크

beautifulsoup4(bs4)에는 request 부분이 포함되어 있지 않다

하지만 scrapy는 포함되어 있고, 동시에 여러 개의 요청을 보낸다, 즉 별도구현없이 빠르다는 차이점이 있다

'Language > python' 카테고리의 다른 글

[Utility] URL 인코딩된 파일이름 URL 디코드 (0)	2019.06.17
[python] map object (0)	2018.04.26
[python] for문과 yield문, return문 (0)	2018.04.24
[python] elasticsearch-dsl scan / index / doc_type (0)	2018.04.12
[python] logging handler class 소개 (0)	2018.04.12

WRITTEN BY

: hojongs
블로그 옮겼습니다 https://hojongs.github.io/

[python] map object

Language/python 2018. 4. 26. 15:33

share this post

map object 생성방법은 아래와 같다

map(type, iterable)

어디에 사용할 수 있느냐? 하나의 라인에 여러 개의 int value를 입력받을 때 사용할 수 있다

a,b,c = map(int, input().split(' '))

# input example: 1 2 3

map 또한 iterable 타입으로서, next() 함수를 통해 순회할 수 있다

'Language > python' 카테고리의 다른 글

[Utility] URL 인코딩된 파일이름 URL 디코드 (0)	2019.06.17
[python] BeautifulSoup4 vs Scrapy (0)	2018.05.01
[python] for문과 yield문, return문 (0)	2018.04.24
[python] elasticsearch-dsl scan / index / doc_type (0)	2018.04.12
[python] logging handler class 소개 (0)	2018.04.12

WRITTEN BY

: hojongs
블로그 옮겼습니다 https://hojongs.github.io/

[python] for문과 yield문, return문

Language/python 2018. 4. 24. 11:37

share this post

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def func():
    for i in range(10):
        yield i
    return 'exit'
 
for i in func():
    print(i)
 
''' output
0
1
2
...
9
'''
 
cs

마지막 return문의 'exit'는 출력되지 않는다

'Language > python' 카테고리의 다른 글

[python] BeautifulSoup4 vs Scrapy (0)	2018.05.01
[python] map object (0)	2018.04.26
[python] elasticsearch-dsl scan / index / doc_type (0)	2018.04.12
[python] logging handler class 소개 (0)	2018.04.12
[python] logging process/order/cycle/how-to-work (0)	2018.04.11

WRITTEN BY

: hojongs
블로그 옮겼습니다 https://hojongs.github.io/

[python] elasticsearch-dsl scan / index / doc_type

Language/python 2018. 4. 12. 18:26

share this post

elasticsearch-dsl에는 Search class가 있다

Search class는 scan() API가 있다

이것은 elasticsearch-py.helpers.scan() API의 wrapper로서,

이것은 elasticsearch search의 scroll API의 wrapper이다

이 scan() API를 사용하려는데, 아래 코드는 에러가 발생했다

우선 Post라는 DocType 클래스를 정의한 후이다

Post.search().scan()

--------------

This feature is experimental and may be subject to change.

위에서 작성하던 DocType.search()이라는 API는, experimental function이라고 한다

그런 이유에서 필자는 원치않는 동작을 체험하였기 때문에

DocType.search() 대신 Search() class를 사용하는 것을 추천한다

2018.04.12.

'Language > python' 카테고리의 다른 글

[python] map object (0)	2018.04.26
[python] for문과 yield문, return문 (0)	2018.04.24
[python] logging handler class 소개 (0)	2018.04.12
[python] logging process/order/cycle/how-to-work (0)	2018.04.11
[python] got multiple values for keyword argument 'query' error (0)	2018.04.11

WRITTEN BY

: hojongs
블로그 옮겼습니다 https://hojongs.github.io/

[python] logging handler class 소개

Language/python 2018. 4. 12. 15:54

share this post

StreamHandler

그냥 쓰면 된다

기본적인 핸들러 사용법은 아래

log = logging.getLogger('nl-api') # get logger
log.setLevel(logging.DEBUG) # set level to logger

fmt = logging.Formatter(logging.BASIC_FORMAT) # create formatter
ch = logging.StreamHandler() # create handler
ch.setFormatter(fmt) # set formatter to handler
ch.setLevel(logging.DEBUG) # set level to handler
log.addHandler(ch) # add handler to logger

RotatingFileHandler

파일핸들러를 사용해도 되지만, 오랫동안/많이 로그를 남기면 파일 사이즈가 커질 수 있다

이 핸들러와 파일핸들러의 차이점은, 기준 크기 이상이 되면 rollover하여 새로운 파일에 로깅한다는 것이다

option은 filename과 maxBytes, backupCount이다

maxBytes는 로그파일 하나의 기준 크기이고, backupCount는 로그파일의 갯수이다

maxBytes * backupCount 용량 이상의 로그가 쌓일경우, 오래된 것부터 삭제된다

maxBytes는 원하는 로그파일당 사이즈를 주면 되겠고

backupCount는 여유있게 주면 되겠다 (문제없는한 충분히)

FileHandler보다는 advanced하다

'Language > python' 카테고리의 다른 글

[python] for문과 yield문, return문 (0)	2018.04.24
[python] elasticsearch-dsl scan / index / doc_type (0)	2018.04.12
[python] logging process/order/cycle/how-to-work (0)	2018.04.11
[python] got multiple values for keyword argument 'query' error (0)	2018.04.11
[python] elasticsearch-dsl DocType.save() (0)	2018.04.11

WRITTEN BY

: hojongs
블로그 옮겼습니다 https://hojongs.github.io/

[python] logging process/order/cycle/how-to-work

Language/python 2018. 4. 11. 23:27

share this post

class logging.Logger

propagate

If this attribute evaluates to true, events logged to this logger will be passed to the handlers of higher level (ancestor) loggers, in addition to any handlers attached to this logger. Messages are passed directly to the ancestor loggers’ handlers - neither the level nor filters of the ancestor loggers in question are considered.

If this evaluates to false, logging messages are not passed to the handlers of ancestor loggers.

The constructor sets this attribute to True.

Logger의 propagate의 default value는 True이다

propagate=True이면, higher level logger's handlers에게 logging이 passing된다

level이나, handler에 대한 처리순서는 다음과 같다

log = logging.getLogger('test')

log.debug('hello')

1. log에서 처리 (log.level이 debug보다 높을경우 종료, 즉 handler도 패스되지 않음)

2. log's handlers에서 처리 (각 handlers의 level에 따라 처리)

3. log's ancestors's handlers (마찬가지로 각 handlers의 level의 따라 처리) (ancestors logger는 거치지않음)

logger들은 입구이고, handler들은 출구이다

---

logger의 level이 NOTSET일 경우에는, 아무 로그도 처리하지 않는다

handler의 level이 NOTSET일 경우에는, 모든 로그를 처리한다

'Language > python' 카테고리의 다른 글

[python] elasticsearch-dsl scan / index / doc_type (0)	2018.04.12
[python] logging handler class 소개 (0)	2018.04.12
[python] got multiple values for keyword argument 'query' error (0)	2018.04.11
[python] elasticsearch-dsl DocType.save() (0)	2018.04.11
[python] u'text' - prefix "u"의 의미 (0)	2018.04.11

WRITTEN BY

: hojongs
블로그 옮겼습니다 https://hojongs.github.io/

[python] got multiple values for keyword argument 'query' error

Language/python 2018. 4. 11. 22:42

share this post

위 에러는 아래와 같은 경우에 발생한다

def func(**kwargs): pass

func(param='temp1', param='temp2')

param이라는 keyword argument를 여러 개(multiple) 보냈을 때 발생한다

'Language > python' 카테고리의 다른 글

[python] logging handler class 소개 (0)	2018.04.12
[python] logging process/order/cycle/how-to-work (0)	2018.04.11
[python] elasticsearch-dsl DocType.save() (0)	2018.04.11
[python] u'text' - prefix "u"의 의미 (0)	2018.04.11
[python] vs ipython, import문 (0)	2018.04.07

WRITTEN BY

: hojongs
블로그 옮겼습니다 https://hojongs.github.io/

[python] elasticsearch-dsl DocType.save()

Language/python 2018. 4. 11. 22:09

share this post

DocType.save()

return value

- True: the document was created

- False: the document was overwritten

if save fails -> raise exception

'Language > python' 카테고리의 다른 글

[python] logging process/order/cycle/how-to-work (0)	2018.04.11
[python] got multiple values for keyword argument 'query' error (0)	2018.04.11
[python] u'text' - prefix "u"의 의미 (0)	2018.04.11
[python] vs ipython, import문 (0)	2018.04.07
[elasticsearch] elasticsearch-dsl get mapping from DocType (0)	2018.04.07

WRITTEN BY

: hojongs
블로그 옮겼습니다 https://hojongs.github.io/

[python] u'text' - prefix "u"의 의미

Language/python 2018. 4. 11. 12:51

share this post

unicode string을 의미하는 prefix로서, python2의 syntax이다 (python3에서는 굳이 붙이지 않아도 된다)

https://stackoverflow.com/a/2464968

'Language > python' 카테고리의 다른 글

[python] got multiple values for keyword argument 'query' error (0)	2018.04.11
[python] elasticsearch-dsl DocType.save() (0)	2018.04.11
[python] vs ipython, import문 (0)	2018.04.07
[elasticsearch] elasticsearch-dsl get mapping from DocType (0)	2018.04.07
[python] OSError: raw write error (during scrapy crawling) (0)	2018.04.06

WRITTEN BY

: hojongs
블로그 옮겼습니다 https://hojongs.github.io/

[python] vs ipython, import문

Language/python 2018. 4. 7. 18:50

share this post

패키지의 모듈 뿐만아니라 내부 패키지도 모두 import 해준다는 차이점이 있다 (바닐라 파이썬 인터프리터는 안됨)

import urllib

urllib.parse # python에서는 에러, ipython에서는 가능

'Language > python' 카테고리의 다른 글

[python] elasticsearch-dsl DocType.save() (0)	2018.04.11
[python] u'text' - prefix "u"의 의미 (0)	2018.04.11
[elasticsearch] elasticsearch-dsl get mapping from DocType (0)	2018.04.07
[python] OSError: raw write error (during scrapy crawling) (0)	2018.04.06
[python] SQLAlchemy (0)	2018.04.06

WRITTEN BY

: hojongs
블로그 옮겼습니다 https://hojongs.github.io/

[elasticsearch] elasticsearch-dsl get mapping from DocType

Language/python 2018. 4. 7. 12:08

share this post

DocType 클래스를 정의했다면 (이름은 Post이라고 가정)

Post.init()으로 mapping을 populate시킬 수 있다

그리고 그 mapping을 dict 또는 string으로 얻고싶다면 아래와 같다

Post._doc_type.mapping.to_dict()

---

init()에서 에러가 발생한다면 (text to keyword conflict)

reindex 해야하므로 참고하자

'Language > python' 카테고리의 다른 글

[python] u'text' - prefix "u"의 의미 (0)	2018.04.11
[python] vs ipython, import문 (0)	2018.04.07
[python] OSError: raw write error (during scrapy crawling) (0)	2018.04.06
[python] SQLAlchemy (0)	2018.04.06
[python, Elasticsearch] python elsticsearch-dsl library 코드 분석 (0)	2018.04.06

WRITTEN BY

: hojongs
블로그 옮겼습니다 https://hojongs.github.io/

[python] OSError: raw write error (during scrapy crawling)

Language/python 2018. 4. 6. 23:21

share this post

http://enjoytools.net/xe/board_PZRP31/7961

OSError: raw write() returned invalid length 112 (should have been between 0 
and 56)

아래 라인을 추가하면 된다고 한다

100% 확신은 아니지만, 아래 코드 추가 후 에러가 발생하지 않았다

import win_unicode_console

win_unicode_console.enable()

필자의 파이썬 버전은 3.5.4이다

'Language > python' 카테고리의 다른 글

[python] vs ipython, import문 (0)	2018.04.07
[elasticsearch] elasticsearch-dsl get mapping from DocType (0)	2018.04.07
[python] SQLAlchemy (0)	2018.04.06
[python, Elasticsearch] python elsticsearch-dsl library 코드 분석 (0)	2018.04.06
[python] pycharm interpreter console error (0)	2018.04.06

WRITTEN BY

: hojongs
블로그 옮겼습니다 https://hojongs.github.io/

[python] SQLAlchemy

Language/python 2018. 4. 6. 19:26

share this post

python에는 여러가지 SQL ORM이 있다

필자가 자주 본 것은 Django's ORM, SQLAlchemy(flask에서 사용하는)

이번에 SQLAlchemy를 간단하게 사용하므로 기록해놓으려 한다

https://www.pythoncentral.io/overview-sqlalchemys-expression-language-orm-queries/

declarative_base(),

Column, String, Integer,

create_engine('sqlite:///')

가장 기초적으로는 이러한 것들만 있으면 된다

'sqlite:///' 는 in-memory db이다

여기서 궁금했던 것은, 두가지

String vs VARCHAR? -> length 제한이 없다면 String으로 사용하자 (https://stackoverflow.com/a/41136521)

sqlite URL?

absolute path: sqlite:///C:\\Users\\...\\data\\db.db

relative path: sqlite:///data/db.db

db 파일이 없을경우, 새로 생성된다

----------

http://docs.sqlalchemy.org/en/latest/orm/tutorial.html#common-filter-operators

1. create_engine()으로 DB를 만든다

2. sessionmaker()로 Session class을 생성하고, configure하고, Session instance를 생성한다

3. Session.query(<Table class>).first() -> 첫번째 row를 가져온다

'Language > python' 카테고리의 다른 글

[elasticsearch] elasticsearch-dsl get mapping from DocType (0)	2018.04.07
[python] OSError: raw write error (during scrapy crawling) (0)	2018.04.06
[python, Elasticsearch] python elsticsearch-dsl library 코드 분석 (0)	2018.04.06
[python] pycharm interpreter console error (0)	2018.04.06
[python] pyreadline 패키지 (0)	2018.04.06

WRITTEN BY

: hojongs
블로그 옮겼습니다 https://hojongs.github.io/

[python, Elasticsearch] python elsticsearch-dsl library 코드 분석

Language/python 2018. 4. 6. 18:36

share this post

얼마전 프로젝트를 할 때 엘리스틱서치를 사용하기 위해서

라이브러리 수준의 코드를 만들었었기 때문에, 나의 코드와 라이브러리의 코드를 비교해보고 싶어서 코드를 읽어보게 되었다

https://github.com/elastic/elasticsearch-dsl-py/blob/master/elasticsearch_dsl/search.py

우선 search.py 파일 Search 클래스의 filter부터 봤다

filter 메서드는 query 메서드의 wrapper처럼 동작한다

그리고 query 메서드는, 실제로는 메서드가 아니라 ProxyDescriptor라는 오브젝트이다

그리고 실제로 query문은 query.py Q 펑션에 의해 세팅된다 (ProxyDescriptor.__set__()에서 볼수있듯이)

Q는 Query

- 추가 예정 -

'Language > python' 카테고리의 다른 글

[python] OSError: raw write error (during scrapy crawling) (0)	2018.04.06
[python] SQLAlchemy (0)	2018.04.06
[python] pycharm interpreter console error (0)	2018.04.06
[python] pyreadline 패키지 (0)	2018.04.06
[python] ImportError: No module named 'win32api' (1)	2018.04.03

WRITTEN BY

: hojongs
블로그 옮겼습니다 https://hojongs.github.io/

[python] pycharm interpreter console error

Language/python 2018. 4. 6. 14:49

share this post

https://stackoverflow.com/a/49632135

pycharm에서 ipython 6.3.0을 지원 못해서 발생한 문제.

6.2.1로 다운그레이드 해주면 된다

pip install 'ipython<6.3'

'Language > python' 카테고리의 다른 글

[python] SQLAlchemy (0)	2018.04.06
[python, Elasticsearch] python elsticsearch-dsl library 코드 분석 (0)	2018.04.06
[python] pyreadline 패키지 (0)	2018.04.06
[python] ImportError: No module named 'win32api' (1)	2018.04.03
[python] Poweshell virtualenv activate error 해결방법 (0)	2018.04.03

WRITTEN BY

: hojongs
블로그 옮겼습니다 https://hojongs.github.io/

[python] pyreadline 패키지

Language/python 2018. 4. 6. 14:20

share this post

pyreadline은 readline의 python implementation이다 (아래 링크 인용)

http://pythonhosted.org/pyreadline/introduction.html#a-python-implementation-of-gnu-readline

(readline의 기능 및 example code)

https://ko.wikipedia.org/wiki/GNU_readline

예제 코드를 컴파일 하기전에 readline 라이브러리 설치 필요

sudo apt-get install libreadline-dev

https://stackoverflow.com/questions/23085076/readline-readline-h-file-not-found

'Language > python' 카테고리의 다른 글

[python, Elasticsearch] python elsticsearch-dsl library 코드 분석 (0)	2018.04.06
[python] pycharm interpreter console error (0)	2018.04.06
[python] ImportError: No module named 'win32api' (1)	2018.04.03
[python] Poweshell virtualenv activate error 해결방법 (0)	2018.04.03
[python] pip install whl file (0)	2018.04.03

WRITTEN BY

: hojongs
블로그 옮겼습니다 https://hojongs.github.io/

[python] ImportError: No module named 'win32api'

Language/python 2018. 4. 3. 16:33

share this post

windows scrapy 실행 중 발생

아래 패키지를 설치하면 된다

pip install pywin32

(win32api가 아니라 pywin32 설치하면 됨)

'Language > python' 카테고리의 다른 글

[python] pycharm interpreter console error (0)	2018.04.06
[python] pyreadline 패키지 (0)	2018.04.06
[python] Poweshell virtualenv activate error 해결방법 (0)	2018.04.03
[python] pip install whl file (0)	2018.04.03
[python] SQLAlchemy (0)	2018.04.03

WRITTEN BY

: hojongs
블로그 옮겼습니다 https://hojongs.github.io/

[python] Poweshell virtualenv activate error 해결방법

Language/python 2018. 4. 3. 16:30

share this post

cmd가 아닌 powershell이라면 activate.bat이 아니라 activate.ps1를 실행해야 한다

근데 에러가 발생한다

<Scripts\activate.ps1 실행결과>

.\activate.ps1 : 이 시스템에서 스크립트를 실행할 수 없으므로 C:\Users\20170218\Desktop\temp\naver_stock\venv\Scripts\ac

tivate.ps1 파일을 로드할 수 없습니다. 자세한 내용은 about_Execution_Policies(https://go.microsoft.com/fwlink/?LinkID=13

5170)를 참조하십시오.

위치 줄:1 문자:1

+ .\activate.ps1

+ ~~~~~~~~~~~~~~

+ CategoryInfo : 보안 오류: (:) [], PSSecurityException

+ FullyQualifiedErrorId : UnauthorizedAccess

솔루션은 아래와 같다

https://stackoverflow.com/a/18713789

Poweshell 관리자 모드로 실행 후, 아래 커맨드를 실행한다

Set-ExecutionPolicy Unrestricted

아래 커맨드로 설정값을 확인해볼 수 있다

Get-ExecutionPolicy

다시 activate.ps1을 실행하면 virtualenv가 잘 실행됨을 확인할 수 있다

문제는 해결되지만, powershell 보안정책을 꺼둔 것이므로, 그건 알아두도록 하자

----

에러가 UnauthorizedAccess니까, authorize를 하면 되지 않을까 라는 생각도 들지만, 다음에 알아보자

'Language > python' 카테고리의 다른 글

[python] pyreadline 패키지 (0)	2018.04.06
[python] ImportError: No module named 'win32api' (1)	2018.04.03
[python] pip install whl file (0)	2018.04.03
[python] SQLAlchemy (0)	2018.04.03
[python] yield, generator, coroutine (0)	2018.03.30

WRITTEN BY

: hojongs
블로그 옮겼습니다 https://hojongs.github.io/

[python] pip install whl file

Language/python 2018. 4. 3. 15:52

share this post

pip install <whl file path>

'Language > python' 카테고리의 다른 글

[python] ImportError: No module named 'win32api' (1)	2018.04.03
[python] Poweshell virtualenv activate error 해결방법 (0)	2018.04.03
[python] SQLAlchemy (0)	2018.04.03
[python] yield, generator, coroutine (0)	2018.03.30
[python] elasticsearch-py max_tries (0)	2018.03.30

WRITTEN BY

: hojongs
블로그 옮겼습니다 https://hojongs.github.io/

[python] SQLAlchemy

Language/python 2018. 4. 3. 15:41

share this post

Python ORM(Object Relational Mapper) library

'Language > python' 카테고리의 다른 글

[python] Poweshell virtualenv activate error 해결방법 (0)	2018.04.03
[python] pip install whl file (0)	2018.04.03
[python] yield, generator, coroutine (0)	2018.03.30
[python] elasticsearch-py max_tries (0)	2018.03.30
[python] 유용한 scrapy command (0)	2018.03.29

WRITTEN BY

: hojongs
블로그 옮겼습니다 https://hojongs.github.io/

[python] yield, generator, coroutine

Language/python 2018. 3. 30. 18:03

share this post

아래 내용에는 필자가 잘못 이해하여 틀린 내용이 있을수도 있다.

https://mingrammer.com/translation-iterators-vs-generators/

yield가 뭐지?

python에서 yield문을 포함하는 함수는, 리턴값이 generator이다

generator는 iterator의 한 종류이다 (위 링크의 글 인용)

그리고 generator를 리턴하는 함수를 coroutine이라고 한다

코루틴이란, 함수의 일부만을 수행하고 suspend되는 함수이다

suspend되는 위치는 yield문까지이다

yield문을 통해 함수를 coroutine으로 만들 수 있다

coroutine을 왜 만들지?

coroutine의 장점?

함수를 coroutine으로 만들면, yield문의 위치에서의 동작을 함수 외부에 맡길 수 있다

callback 파라미터를 넘기는 것과의 차이는, 함수가 suspend된다는 점이다

(결국 suspend되기 때문에 임의 동작이 가능한 것이지만)

coroutine은, 함수를 일부분씩 실행할 수 있다는 장점이 있다

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
def func():
    for i in range(10):
        print('func', i)
        yield i # yield and return i
 
def main():
    result = []
 
    for i in func():
        result.append(i)
 
    for i in result:
        print('result', i)
 
    
 
if __name__ == '__main__':
    main()
 
'''
# output
func 0
func 1
func 2
func 3
func 4
func 5
func 6
func 7
func 8
func 9
result 0
result 1
result 2
result 3
result 4
result 5
result 6
result 7
result 8
result 9
'''
 
Colored by Color Scripter
cs

event = (yield) 문과 같이 함수 외부로부터 값을 받을수도 있다

이건 알아만 두자

'Language > python' 카테고리의 다른 글

[python] pip install whl file (0)	2018.04.03
[python] SQLAlchemy (0)	2018.04.03
[python] elasticsearch-py max_tries (0)	2018.03.30
[python] 유용한 scrapy command (0)	2018.03.29
[python] scrapy concurrent_requests (0)	2018.03.27

WRITTEN BY

: hojongs
블로그 옮겼습니다 https://hojongs.github.io/

[python] elasticsearch-py max_tries

Language/python 2018. 3. 30. 01:10

share this post

Elasticsearch(max_tries=0)

max_tries parameter -> kwargs -> Transport class

'Language > python' 카테고리의 다른 글

[python] SQLAlchemy (0)	2018.04.03
[python] yield, generator, coroutine (0)	2018.03.30
[python] 유용한 scrapy command (0)	2018.03.29
[python] scrapy concurrent_requests (0)	2018.03.27
[python] win32api ImportError (0)	2018.03.26

WRITTEN BY

: hojongs
블로그 옮겼습니다 https://hojongs.github.io/

[python] 유용한 scrapy command

Language/python 2018. 3. 29. 20:27

share this post

scrapy startproject <project-name>

프로젝트 생성

scrapy crawl <spider-name>

스파이더 실행

scrapy shell <url>

scrapy object들(request/response/crawler/spider)을 interpreter에서 다뤄볼 수 있다

xpath도 테스트 해볼수 있다

https://doc.scrapy.org/en/latest/topics/shell.html#configuring-the-shell

shell 커맨드 사용시, ipython이 설치되어 있다면 ipython이 실행될 것이다

scrapy view <url>

JS 등이 실행되기 전, 즉 크롤러에서 받게되는 html 문서를 확인가능

'Language > python' 카테고리의 다른 글

[python] yield, generator, coroutine (0)	2018.03.30
[python] elasticsearch-py max_tries (0)	2018.03.30
[python] scrapy concurrent_requests (0)	2018.03.27
[python] win32api ImportError (0)	2018.03.26
[python] windows pip Twisted install error (0)	2018.03.23

WRITTEN BY

: hojongs
블로그 옮겼습니다 https://hojongs.github.io/

[python] scrapy concurrent_requests

Language/python 2018. 3. 27. 11:28

share this post

scrapy는 동시에 여러 request를 보낸다

settings.py에서 CONCURRENT_REQUESTS 값을 통해 그 최대치를 조절할 수 있다

기본값은 16개이다

scrapy는 Twisted를 기반으로 한다

Twisted의 description은 아래와 같다

python-written Event-driven Network Programming Framework, which is single-thread

single-thread이지만 async로 동작하므로 (node.js처럼) 한번에 하나의 request만 전송하지 않는 것이다

'Language > python' 카테고리의 다른 글

[python] elasticsearch-py max_tries (0)	2018.03.30
[python] 유용한 scrapy command (0)	2018.03.29
[python] win32api ImportError (0)	2018.03.26
[python] windows pip Twisted install error (0)	2018.03.23
[python] scrapy with form-data (0)	2018.03.23

WRITTEN BY

: hojongs
블로그 옮겼습니다 https://hojongs.github.io/

[python] win32api ImportError

Language/python 2018. 3. 26. 16:28

share this post

https://stackoverflow.com/a/35948588

pip install pypiwin32

'Language > python' 카테고리의 다른 글

[python] 유용한 scrapy command (0)	2018.03.29
[python] scrapy concurrent_requests (0)	2018.03.27
[python] windows pip Twisted install error (0)	2018.03.23
[python] scrapy with form-data (0)	2018.03.23
python scrapy 한글 인코딩 (0)	2018.03.16

WRITTEN BY

: hojongs
블로그 옮겼습니다 https://hojongs.github.io/

[python] windows pip Twisted install error

Language/python 2018. 3. 23. 22:28

share this post

<들어가기 전에>

기본적으로, windows pip를 사용하기 위해서는 virtualenv가 아닌이상

administrator permission으로 cmd를 실행해야 한다

필자의 경우 pip로 설치 시 cl.exe와 io.h 에러가 발생하는데, cl.exe는 PATH 문제라 치고, io.h는 못찾겠다

http://hashcode.co.kr/questions/4649/%EC%9C%88%EB%8F%84%EC%9A%B0-twisted-%EC%84%A4%EC%B9%98-%EC%A4%91-c-%EB%B9%8C%EB%8D%94-%EC%98%A4%EB%A5%98

https://www.lfd.uci.edu/~gohlke/pythonlibs/#twisted

위 링크에서 twisted 인스톨러를 다운받을 수 있다고 한다

twisted index를 찾아가서

자신의 cpython 버전과 비트수에 맞춰서 whl 파일을 설치하자 (OS의 비트수가 아니라 파이썬의 비트수임을 유의하자)

whl 파일 경로에서 아래 커맨드를 실행하자

pip install <twisted-whl>

문제없이 설치가 완료된다

~~아래는 시행착오 과정~~

~~https://stackoverflow.com/a/43098806~~

~~C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvars64.bat~~

~~cl.h solution~~

~~필자의 경우, Visual Studio 2017이 설치되어있고 아래의 경로를 env PATH에 추가해주었다~~

~~C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\bin\Hostx64\x64~~

~~그랬더니 io.h가 발생하더라~~

~~io.h solution~~

~~env INCLUDE에 아래 경로를 추가해주었다~~

~~C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include~~

'Language > python' 카테고리의 다른 글

[python] scrapy concurrent_requests (0)	2018.03.27
[python] win32api ImportError (0)	2018.03.26
[python] scrapy with form-data (0)	2018.03.23
python scrapy 한글 인코딩 (0)	2018.03.16
[python] multiprocessing (0)	2018.01.15

WRITTEN BY

: hojongs
블로그 옮겼습니다 https://hojongs.github.io/

[python] scrapy with form-data

Language/python 2018. 3. 23. 21:31

share this post

scrapy: Web Crawling Framework

beautifulsoup과 비교해보자면, beautifulsoup은 parseing library인 반면

scrapy는 더 강한 abstraction을 제공하는 framework이다

scrapy.Request(url=URL, callback=self.parse, method=METHOD, body=BODY, meta=META)

기본적인 사용방법은 위와 같은데 body에 form-data를 전송할 경우

FormRequest를 사용해야한다.

scrapy.FormRequest(..., formdata=FORM_DATA, ...)

request의 body(data)는 form-data, json, wav 등 여러가지 타입이 될 수 있다

request는 이 타입을 header 중 Content-Type field에 명시해야한다

scrapy에서는 FormRequest 클래스를 통해 이러한 수고를 덜어준다

Request 클래스는 Content-Type 헤더가 명시되어 있지 않으므로, server에서 body data를 form-data 타입으로 인식하지 못한다

'Language > python' 카테고리의 다른 글

[python] win32api ImportError (0)	2018.03.26
[python] windows pip Twisted install error (0)	2018.03.23
python scrapy 한글 인코딩 (0)	2018.03.16
[python] multiprocessing (0)	2018.01.15
[Language] [IDE] PyCharm Run Path (Working Directory) (0)	2018.01.05

WRITTEN BY

: hojongs
블로그 옮겼습니다 https://hojongs.github.io/

python scrapy 한글 인코딩

Language/python 2018. 3. 16. 22:34

share this post

python scrapy를 통해 사이트를 크롤링 하여 json으로 저장하는데 인코딩 문제가 발생했다

원인은 python json 라이브러리의 ensure_ascii=True 옵션

project의 settings.py 파일에 아래 라인을 추가하여 해결하였다

FEED_EXPORT_ENCODING = 'utf-8'

---

인코딩 결과를 확인하려면 vim과 terminal(ssh client)의 인코딩도 싱크해야한다

cat >> ~/.vimrc

set enc=utf-8

terminal의 인코딩도 utf-8로 설정

'Language > python' 카테고리의 다른 글

[python] windows pip Twisted install error (0)	2018.03.23
[python] scrapy with form-data (0)	2018.03.23
[python] multiprocessing (0)	2018.01.15
[Language] [IDE] PyCharm Run Path (Working Directory) (0)	2018.01.05
[python] tornado package (0)	2018.01.01

WRITTEN BY

: hojongs
블로그 옮겼습니다 https://hojongs.github.io/

[python] multiprocessing

Language/python 2018. 1. 15. 10:33

share this post

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
def func(q, start, interval):
    result = 0
    for i in range(start, start+interval):
        result += i
    q.put(result)
 
proc_cnt = 4
q_list = []
proc_list = []
output = 0
start = 0
interval = int(1000000000 / proc_cnt)
 
# process create & start
for i in range(proc_cnt):
    q = Queue()
    p = Process(target=func, args=(q, start, interval))
    p.start()
    start += interval
    q_list.append(q)
    proc_list.append(p)
 
# receive output from processes
for q in q_list:
    temp = q.get()
    # print(temp)
    output += temp
print(output)
 
# vs single-processing
result = 0
for i in range(1000000000):
    result += i
print(result)
 
Colored by Color Scripter
cs

'Language > python' 카테고리의 다른 글

[python] scrapy with form-data (0)	2018.03.23
python scrapy 한글 인코딩 (0)	2018.03.16
[Language] [IDE] PyCharm Run Path (Working Directory) (0)	2018.01.05
[python] tornado package (0)	2018.01.01
[python] ctypes: ARRAY vs POINTER 차이/비교 (0)	2018.01.01

WRITTEN BY

: hojongs
블로그 옮겼습니다 https://hojongs.github.io/

사용중인 블로그 스킨이 방명록을 지원하지 않습니다.

초대장을 받고싶으신 분들은 댓글 또는 블로그 설명의 메일로.

'Language/python'에 해당하는 글 65건

'Language > python' 카테고리의 다른 글

'Language > python' 카테고리의 다른 글

'Language > python' 카테고리의 다른 글

'Language > python' 카테고리의 다른 글

'Language > python' 카테고리의 다른 글

'Language > python' 카테고리의 다른 글

'Language > python' 카테고리의 다른 글

'Language > python' 카테고리의 다른 글

'Language > python' 카테고리의 다른 글

'Language > python' 카테고리의 다른 글

'Language > python' 카테고리의 다른 글

'Language > python' 카테고리의 다른 글

'Language > python' 카테고리의 다른 글

'Language > python' 카테고리의 다른 글

'Language > python' 카테고리의 다른 글

'Language > python' 카테고리의 다른 글

'Language > python' 카테고리의 다른 글

'Language > python' 카테고리의 다른 글

'Language > python' 카테고리의 다른 글

'Language > python' 카테고리의 다른 글

'Language > python' 카테고리의 다른 글

'Language > python' 카테고리의 다른 글

'Language > python' 카테고리의 다른 글

'Language > python' 카테고리의 다른 글

'Language > python' 카테고리의 다른 글

'Language > python' 카테고리의 다른 글

'Language > python' 카테고리의 다른 글

'Language > python' 카테고리의 다른 글

'Language > python' 카테고리의 다른 글

'Language > python' 카테고리의 다른 글

티스토리툴바