scheduling crawling using scrapy with crontab

bash scrapy web-crawler crontab

876 просмотра

1 ответ

45 Репутация автора

i am newbie on using crontab, i just try to read article how to automatically scheduling crawling using scrapy every 5 minutes, and some article suggest to using crontab... but i dunno how to implements the right script...

here my .sh file

#!/bin/sh
cd /home/kautsar/Downloads/thehack
scrapy crawl thehack

i already +x to file runScrapy.sh

but when i try to using

crontab -e
*/5**** cd /home/kautsar && sh runScrapy.sh

then when i presses enter button the result is "?" anybody know what it means? can you explain on this? please show me the right way to do web crawling periodically which time is set by me.. thanks alot

Автор: beboy Источник Размещён: 18.07.2016 03:15

Ответы (1)


1 плюс

1292 Репутация автора

Решение

I suspect that your default editor is being set to ed. No idea why. If this is the case, you can read up on how to use in with the man ed command, but it would likely be better to configure the system to use your favorite editor. Let's assume that is vim. If it is not, replace vim with the appropriate name.

export EDITOR=vim
crontab -e

At this point, you should be in your favorite editor editing the crontab file. I recommend you add a comment to the file similar to the one below. I always do this to remind myself what all the columns are for the various asterisks. DoM is Day of Month (1-31), DoW is Day of Week (0-7 or Sun/Mon/Tue, etc.). Then your shell script as a fully qualified file name (so you need not have the cd and the script invocation).

# min hr DoM mon DoW cmd
  */5 *  *   *   *    sh /home/kautsar/runScrapy.sh

At this point, you should be able to save and exit as you would using your editor.

There are other pitfalls awaiting you in a crontab file. Read man 5 crontab and pay particular attention to the definitions of PATH and HOME. cron runs in abbreviated environment, not your login environment, so sometimes the PATH variable needs to be set. You may find it useful that $HOME is set to your home directory. For instance, your command could be $HOME/runScrapy.sh. Finally, you can set MAILTO to be the address to which cron sends any command output, which can be useful if outgoing email is not configured on your system.

Автор: Greg Tarsa Размещён: 18.07.2016 05:49
Вопросы из категории :
32x32