How to install Scrapy on Ubuntu or Debian

By webmaster • 3 years ago • Scrapy

Scrapy is a Python-based scraping and web crawling program and is generally available as a pip package. Some Linux distributions like Ubuntu and Debian however have Scrapy in its default package repository and can be installed via apt.

Ubuntu version of Scrapy is more tightly integrated with the operating system in a way that it installs to the default application path, and you don't need to install additional tools such as pip to have Scrapy installed.

However, the installed version is normally tied to the distribution version, so you won't get the latest version of Scrapy unless you also upgrade your Ubuntu or Debian version.

Scrapy can be installed on Ubuntu and Debian using apt at the terminal.

Steps to install Scrapy on Ubuntu or Debian:

Launch terminal application.

Update apt's package list from repository.

$ sudo apt update [sudo] password for user: Hit:1 http://jp.archive.ubuntu.com/ubuntu focal InRelease Get:2 http://jp.archive.ubuntu.com/ubuntu focal-updates InRelease [107 kB] Get:3 http://jp.archive.ubuntu.com/ubuntu focal-backports InRelease [98.3 kB] Get:4 http://jp.archive.ubuntu.com/ubuntu focal-security InRelease [107 kB] Fetched 312 kB in 7s (47.5 kB/s) Reading package lists... Done Building dependency tree Reading state information... Done All packages are up to date.

Install python3-scrapy package using apt.

$ sudo apt install --assume-yes python3-scrapy Reading package lists... Done Building dependency tree Reading state information... Done The following additional packages will be installed:   ipython3 libimagequant0 libjbig0 libjpeg-turbo8 libjpeg8 liblcms2-2   libmysqlclient21 libtiff5 libwebp6 libwebpdemux2 libwebpmux3 mysql-common   python3-backcall python3-boto python3-bs4 python3-cssselect   python3-decorator python3-html5lib python3-ipython python3-ipython-genutils   python3-jedi python3-libxml2 python3-lxml python3-mysqldb python3-olefile   python3-parsel python3-parso python3-pexpect python3-pickleshare python3-pil   python3-prompt-toolkit python3-ptyprocess python3-pydispatch   python3-pygments python3-queuelib python3-soupsieve python3-traitlets   python3-w3lib python3-wcwidth python3-webencodings Suggested packages:   liblcms2-utils python3-genshi python-ipython-doc python3-lxml-dbg   python-lxml-doc default-mysql-server | virtual-mysql-server   python3-mysqldb-dbg python-pexpect-doc python-pil-doc python3-pil-dbg   python-pydispatch-doc python-pygments-doc ttf-bitstream-vera   python-scrapy-doc The following NEW packages will be installed:   ipython3 libimagequant0 libjbig0 libjpeg-turbo8 libjpeg8 liblcms2-2   libmysqlclient21 libtiff5 libwebp6 libwebpdemux2 libwebpmux3 mysql-common   python3-backcall python3-boto python3-bs4 python3-cssselect   python3-decorator python3-html5lib python3-ipython python3-ipython-genutils   python3-jedi python3-libxml2 python3-lxml python3-mysqldb python3-olefile   python3-parsel python3-parso python3-pexpect python3-pickleshare python3-pil   python3-prompt-toolkit python3-ptyprocess python3-pydispatch   python3-pygments python3-queuelib python3-scrapy python3-soupsieve   python3-traitlets python3-w3lib python3-wcwidth python3-webencodings 0 upgraded, 41 newly installed, 0 to remove and 0 not upgraded. Need to get 7,080 kB of archives. After this operation, 37.6 MB of additional disk space will be used. ##### snipped

Start using Scrapy by running scrapy command at the terminal.

$ scrapy Scrapy 1.7.3 - no active project  Usage:   scrapy <command> [options] [args]  Available commands:   bench         Run quick benchmark test   fetch         Fetch a URL using the Scrapy downloader   genspider     Generate new spider using pre-defined templates   runspider     Run a self-contained spider (without creating a project)   settings      Get settings values   shell         Interactive scraping console   startproject  Create new project   version       Print Scrapy version   view          Open URL in browser, as seen by Scrapy    [ more ]      More commands available when run from project directory  Use "scrapy <command> -h" to see more info about a command

An Error Occurred

Services for this domain name have been disabled.

{e&&window.ttq.instance(this.identifier).track(e.event)}))}isLoaded(){return!!window.ttq}}const ADS_PARAM$1="?caf",MESSAGE_PREFIX="FSXDC,.aCS:",ALLOWED_ORIGINS=["https://www.google.com","https://www.adsensecustomsearchads.com","https://syndicatedsearch.goog","https://googleadservices.com"];class Pixels{static build(e){const t=unpackPHPArrayObject(e,"pixel_tracking_data");if(t)return t.useAltTikTokEventsForAdsPlatformUser=e.is_ads,new Pixels(t)}constructor(e){this.onPixelEvent=e=>{const{detail:{type:t}}=e;switch(t){case"visit":case"ctr":case"click":this.providers.forEach((e=>e.handlePixelEvent(t)))}},this.providers=[new Facebook(e.facebook),new Tiktok(e.tiktok,e.useAltTikTokEventsForAdsPlatformUser),new Taboola(e.taboola),new Revcontent(e.revcontent),new Outbrain(e.outbrain)]}listenForEvents(){document.addEventListener("pixel",(e=>{this.onPixelEvent(e)}));window.onmessage=e=>{const{origin:t,data:n}=e;ALLOWED_ORIGINS.includes(t)&&(null==n?void 0:n.startsWith(MESSAGE_PREFIX))&&window.location.search.startsWith(ADS_PARAM$1)&&document.dispatchEvent(new CustomEvent("pixel",{detail:{type:"click"}}))}}dispatchEvent(e){document.dispatchEvent(new CustomEvent("pixel",{detail:e}))}}var State;!function(e){e[e.Pending=0]="Pending",e[e.Loaded=1]="Loaded",e[e.Failure=2]="Failure",e[e.TimedOut=3]="TimedOut",e[e.Errored=4]="Errored"}(State||(State={}));const CAF_SCRIPT_SRC=`https://www.google.com/adsense/domains/caf.js?${GOOGLE_MV3_URL_PARAMS}`,TIMEOUT_SCRIPTS=Number(GOOGLE_CAF_TIMEOUT_SCRIPTS),TIMEOUT_CALLBACKS=Number(GOOGLE_CAF_TIMEOUT_CALLBACKS);class StateMachine{constructor(){this.state=State.Pending}transitionTo(e){this.state=e}transitionFromPendingTo(e){this.done||(this.state=e)}get loaded(){return this.state===State.Loaded}get timedOut(){return this.state===State.TimedOut}get done(){return this.state!==State.Pending}}class Ads{constructor(e,t){this.state={script:new StateMachine,blocks:new StateMachine},this.blocksLoaded=[],this.injectScriptTags=()=>__awaiter(this,void 0,void 0,(function*(){return new Promise((e=>{const t=document.createElement("script");t.type="text/javascript",t.src=CAF_SCRIPT_SRC,t.addEventListener("load",(()=>e(!0))),t.addEventListener("error",(()=>e(!1))),document.body.appendChild(t),TIMEOUT_SCRIPTS>0&&setTimeout((()=>e(!1)),TIMEOUT_SCRIPTS)}))})),this.onPageLoaded=(e,t)=>{if(this.pageLoaded={requestAccepted:e,status:t},this.state.script.done)return;const n=null==t?void 0:t.error_code;n?(this.state.script.transitionTo(State.Failure),this.failureReason=`caf_pageloaderror_${n}`):this.state.script.transitionTo(State.Loaded)},this.onBlockLoaded=(e,t,n,i)=>{this.blocksLoaded.push({containerName:e,adsLoaded:t,isExperimentVariant:n,callbackOptions:i}),this.state.blocks.done||(t?this.state.blocks.transitionTo(State.Loaded):this.blocksLoaded.length>=this.blocks.length&&(this.state.blocks.transitionTo(State.Failure),this.failureReason=`caf_adloadfail_${e}`))},this.onTimeout=()=>{this.state.script.transitionFromPendingTo(State.TimedOut),this.state.blocks.transitionFromPendingTo(State.TimedOut)},this.blocks=e,this.options=t}get loaded(){return this.state.script.loaded&&!this.blocksLoaded.map((e=>e.adsLoaded)).includes(!1)}waitForBlocks(){return __awaiter(this,void 0,void 0,(function*(){return new Promise((e=>{const t=()=>{const n=performance.now();if(this.state.blocks.done)return this.cafLoadTime=Math.round(n-this.cafStartTime),void e();const i=this.blocksLoaded.map((e=>e.adsLoaded));i.includes(!1)||i.length>=this.blocks.length?e():setTimeout(t,50)};t()}))}))}inject(){return __awaiter(this,void 0,void 0,(function*(){try{const e=yield this.injectScriptTags();return this.cafStartTime=performance.now(),e&&void 0!==window.google?(new window.google.ads.domains.Caf(Object.assign(Object.assign({},this.options),{pageLoadedCallback:this.onPageLoaded,adLoadedCallback:this.onBlockLoaded}),...this.blocks),TIMEOUT_CALLBACKS>0&&setTimeout(this.onTimeout,TIMEOUT_CALLBACKS),yield new Promise((e=>{const t=()=>{this.state.script.done?e():setTimeout(t,10)};t()}))):void this.state.script.transitionTo(State.Failure)}catch(e){return void(this.error=e.toString())}}))}toCallbacks(){return{adLoadedCallback:this.blocksLoaded.slice(-1)[0],pageLoadedCallback:this.pageLoaded,cafTimedOut:this.state.script.timedOut||this.state.blocks.timedOut,cafLoadedMs:this.cafLoadTime,googleAdsFailure:!!this.failureReason}}toContext(){const e={cafScriptWasLoaded:this.state.script.loaded,cafScriptLoadTime:this.cafLoadTime,callbacks:this.toCallbacks};return this.error&&(e.js_error={message:this.error}),this.state.script.loaded||(e.zeroclick={reason:"googleAdsFailure"}),e}mockFailedState(){this.state.blocks.transitionTo(State.Failure),this.state.script.transitionTo(State.Failure)}}class TagManager{constructor(e){this.injected=!1,this.identifier=e}inject(){if(this.injected)return;if(!this.identifier)return;if("TEST"===this.identifier)return;const e=document.createElement("script");e.setAttribute("src",`https://www.googletagmanager.com/gtag/js?id=${this.identifier}`),document.head.appendChild(e),this.track(),this.injected=!0}track(){this.push("js",new Date),this.push("config",this.identifier)}push(e,t){window.dataLayer||(window.dataLayer=[]),window.dataLayer.push(arguments)}}const ADS_PARAM="caf",ADS_TRACKING_URL="_tr",BLOCKS_TYPE="ads",BLOCKS_CONTAINER="rs",KNOWN_CAF_PARAMS=["caf","query","afdToken","pcsa","nb","nm","nx","ny","is","clkt"];class Google{static build({pageOptions:e,preferredLanguage:t,blocks:n,googleAnalytics:i},s,a,o){let r={};e&&(r=Object.assign({},e),r.hl||(r.hl=t));let d=null==e?void 0:e.resultsPageBaseUrl;d||(d=window.location.origin);return new Google(s.uuid,n,r,i,d,o)}constructor(e,t,n,i,s,a){this._blocks=t,this._pageOptions=n,this.uuid=e,this._baseURL=new URL(s),this._signature=a,this.ads=new Ads(this.blocks,this.pageOptions),this.tagManager=new TagManager(i)}injectTagManager(){this.tagManager.inject()}injectAds(){return __awaiter(this,void 0,void 0,(function*(){yield this.ads.inject()}))}waitForBlocks(){return __awaiter(this,void 0,void 0,(function*(){return this.ads.waitForBlocks()}))}get blocks(){return(this._blocks||[]).filter((e=>this.wantsToServeAds?e.type===BLOCKS_TYPE:e.container===BLOCKS_CONTAINER)).map((e=>{const t=this.baseURL;new URLSearchParams(window.location.search).forEach(((e,n)=>{t.searchParams.has(n)||t.searchParams.append(n,e)}));const n=Object.assign({},e);if(n.resultsPageBaseUrl=t.toString(),this.wantsToServeAds){const e=new URLSearchParams;e.append("click","true"),e.append("session",this.uuid);const t=Object.assign({},this._signature);delete t.ad_loaded_callback,delete t.caf_loaded_ms,delete t.caf_timed_out,delete t.flex_rule,delete t.frame,delete t.js_error,delete t.no_ads_redirect,delete t.page_headers,delete t.page_request,delete t.page_loaded_callback,delete t.popup,delete t.screen_resolution,delete t.user_has_ad_blocker,delete t.user_preference,delete t.user_supports_darkmode,delete t.user_using_darkmode,delete t.zeroclick,e.append("signature",encode(t)),n.clicktrackUrl=`${TRACKING_DOMAIN}${ADS_TRACKING_URL}?${e.toString()}`}return n}))}get baseURL(){const e=new URL(this._baseURL.origin);return e.searchParams.append(ADS_PARAM,"1"),this._baseURL.searchParams.forEach(((t,n)=>{e.searchParams.append(n,t)})),e}get pageOptions(){const e=Object.assign({},this._pageOptions);return Object.keys(this._pageOptions).forEach((t=>{t.startsWith("bodis")&&delete e[t]})),e}get cannotLoadAds(){return!this.ads.loaded}get wantsToServeAds(){return new URLSearchParams(window.location.search).has(ADS_PARAM)}get adsMode(){return this.ads.loaded&&this.wantsToServeAds}get adsReady(){return this.wantsToServeAds&&!this.cannotLoadAds}get noAdsRedirectUrl(){const e=new URLSearchParams(window.location.search);return KNOWN_CAF_PARAMS.forEach((t=>e.delete(t))),`${window.location.origin}?${e.toString()}`}get callbacks(){return this.ads.toCallbacks()}toContext(){return Object.assign({blocks:this.blocks,pageOptions:this.pageOptions},this.ads.toContext())}}class CookieConsentManager{constructor(){this.injectScriptTag=()=>__awaiter(this,void 0,void 0,(function*(){return new Promise((e=>{const t=document.createElement("script");t.setAttribute("src",COOKIE_CONSENT_JS_URL),t.addEventListener("load",(()=>this.awaitConsent(e))),t.addEventListener("error",(()=>e(!1))),document.head.appendChild(t)}))}))}inject(){return __awaiter(this,void 0,void 0,(function*(){this.injected||!COOKIE_CONSENT_JS_URL||isLocal()||(this.injected=yield this.injectScriptTag())}))}awaitConsent(e){let t=0;const n=setInterval((()=>{t+=1,20===t&&(clearInterval(n),e(!0)),void 0!==window.__tcfapi&&(window.addEventListener("ConsentActivity",(t=>{const{detail:{status:n}}=t;n&&e(!0)})),clearInterval(n))}),50)}}class App{main(){var e,t;return __awaiter(this,void 0,void 0,(function*(){if(this.parkResponse=decode(),this.findDomainResponse=yield getFindDomain(),!this.findDomainResponse)throw new Error("Domain failed to load.");this.pixels=Pixels.build(this.findDomainResponse),null===(e=this.pixels)||void 0===e||e.listenForEvents(),this.adblock=new Adblock,yield this.adblock.inject(),this.google=Google.build(this.findDomainResponse,this.parkResponse,this.adblock,buildSignature({context:this.context,callbacks:null===(t=this.google)||void 0===t?void 0:t.callbacks},"click")),this.google.injectTagManager();const n=Parking.build(this.findDomainResponse,this.google);Render.prerender(n),this.cookieConsentManager=new CookieConsentManager,yield this.cookieConsentManager.inject();let i=Failed.cannotPark(this.findDomainResponse);if(i)return void(yield this.transitionToFailed(i,n));yield this.google.injectAds();let s=Disabled.build(this.findDomainResponse,this.adblock.state);if(s)return void(yield this.transitionToDisabled(s,n));const a=this.adblock.hasAdblocker();a&&this.adblock.handleAdblocked();const o=Sales.build(this.findDomainResponse);if(o)return void(yield this.transitionToSales(o));this.eligibleForZeroClick&&(this.zeroClickResponse=yield getZeroClick(this.context));const r=Redirect.build(this.findDomainResponse,this.zeroClickResponse,this.google);if(r)yield this.transitionToRedirect(r);else{if(a)return s=Disabled.build(this.findDomainResponse,this.adblock.state),void(yield this.transitionToDisabled(s,n));i=Failed.noSponsors(this.google),i?yield this.transitionToFailed(i,n):yield this.transitionToParking(n)}}))}transitionToParking(e){return __awaiter(this,void 0,void 0,(function*(){this.state=e,Render.template(e),Render.revealPage(),yield this.google.waitForBlocks(),yield this.track()}))}transitionToRedirect(e){return __awaiter(this,void 0,void 0,(function*(){this.state=e;const t=this.track();Render.revealPage(),yield waiter(e.delay,(e=>Render.loading(e))),yield t,window.location.href=e.url,log(`➡ Redirecting [${e.url}]`)}))}transitionToFailed(e,t){return __awaiter(this,void 0,void 0,(function*(){this.state=e,Render.message(e.message),Render.injectJS(t.javascript),Render.revealPage(),yield this.track()}))}transitionToSales(e){return __awaiter(this,void 0,void 0,(function*(){this.state=e,e.init(this.context),yield this.track()}))}transitionToDisabled(e,t){return __awaiter(this,void 0,void 0,(function*(){this.state=e,Render.message(e.message),Render.injectJS(t.javascript),Render.revealPage(),yield this.track()}))}track(){var e;return __awaiter(this,void 0,void 0,(function*(){if(!this.state.track)return Promise.resolve();try{const t=this.state.trackingType;return null===(e=this.pixels)||void 0===e||e.dispatchEvent({type:t}),trackVisit({context:this.context,callbacks:this.google.callbacks},t)}catch(e){return}}))}get eligibleForZeroClick(){const{cannotPark:e,canZeroClick:t,zeroClick:n}=this.findDomainResponse,{cannotLoadAds:i,wantsToServeAds:s}=this.google;return this.adblock.state!==Blocking.BLOCKED&&(!!t&&(!!e||(!(!i||s)||!!(null==n?void 0:n.reason))))}get context(){var e,t,n,i;const s=this.findDomainResponse,a=this.parkResponse,o=null===(e=this.state)||void 0===e?void 0:e.toContext(),r=null===(t=this.adblock)||void 0===t?void 0:t.toContext(),d=null===(n=this.google)||void 0===n?void 0:n.toContext(),c=browserState(),l=Object.assign(Object.assign({},null===(i=this.findDomainResponse)||void 0===i?void 0:i.zeroClick),this.zeroClickResponse);return Object.assign(Object.assign(Object.assign(Object.assign(Object.assign(Object.assign(Object.assign({app_version:APP_VERSION},s),a),r),d),o),c),{zeroClick:l})}init(){return __awaiter(this,void 0,void 0,(function*(){try{window.__parkour=this,yield this.main()}catch(e){console.error("app",e);const t=Failed.fromError(e);this.state=t,Render.message(t.message),Render.revealPage()}}))}}(new App).init(),exports.App=App}));

OnlineWebTools

Steps to install Scrapy on Ubuntu or Debian:

Like this:

Related

How to save cookies from cURL request

Like this:

How to change the user agent for cURL

Like this:

How to ignore SSL certificate error in cURL

Like this:

How to send a POST request in cURL

Like this:

How to send data in HTTP request using cURL

Like this:

Ad block detected

An Error Occurred

An Error Occurred

Content blocked

Invalid URL

No sponsors

No Sponsors

Invalid URL

An Error Occurred

An Error Occurred

OnlineWebTools

How to install Scrapy on Ubuntu or Debian

Steps to install Scrapy on Ubuntu or Debian:

Share this:

Like this:

Related

Recent Posts

Featured Category: cURL

How to save cookies from cURL request

Share this:

Like this:

How to change the user agent for cURL

Share this:

Like this:

How to ignore SSL certificate error in cURL

Share this:

Like this:

How to send a POST request in cURL

Share this:

Like this:

How to send data in HTTP request using cURL

Share this:

Like this:

Top Posts

Categories

Ad block detected

An Error Occurred

An Error Occurred

Content blocked

Invalid URL

No sponsors

No Sponsors

Invalid URL

An Error Occurred

An Error Occurred