scheduling crawling using scrapy with crontab
45 Репутация автора
i am newbie on using crontab, i just try to read article how to automatically scheduling crawling using scrapy every 5 minutes, and some article suggest to using crontab... but i dunno how to implements the right script...
here my .sh file
#!/bin/sh cd /home/kautsar/Downloads/thehack scrapy crawl thehack
i already +x to file runScrapy.sh
but when i try to using
crontab -e */5**** cd /home/kautsar && sh runScrapy.sh
then when i presses enter button the result is "?" anybody know what it means? can you explain on this? please show me the right way to do web crawling periodically which time is set by me.. thanks alotАвтор: beboy Источник Размещён: 18.07.2016 03:15
1292 Репутация автора
I suspect that your default editor is being set to
ed. No idea why. If this is the case, you can read up on how to use in with the
man ed command, but it would likely be better to configure the system to use your favorite editor. Let's assume that is
vim. If it is not, replace
vim with the appropriate name.
export EDITOR=vim crontab -e
At this point, you should be in your favorite editor editing the
crontab file. I recommend you add a comment to the file similar to the one below. I always do this to remind myself what all the columns are for the various asterisks.
DoM is Day of Month (1-31),
DoW is Day of Week (0-7 or Sun/Mon/Tue, etc.). Then your shell script as a fully qualified file name (so you need not have the
cd and the script invocation).
# min hr DoM mon DoW cmd */5 * * * * sh /home/kautsar/runScrapy.sh
At this point, you should be able to save and exit as you would using your editor.
There are other pitfalls awaiting you in a
crontab file. Read
man 5 crontab and pay particular attention to the definitions of PATH and HOME.
cron runs in abbreviated environment, not your login environment, so sometimes the PATH variable needs to be set. You may find it useful that $HOME is set to your home directory. For instance, your command could be
$HOME/runScrapy.sh. Finally, you can set MAILTO to be the address to which
cron sends any command output, which can be useful if outgoing email is not configured on your system.