Scraping Yahoo finance historical stock prices


792 просмотра

1 ответ

21 Репутация автора

i am attempting to parse Yahoo finance's historical stock price tables for various stocks using BeautifulSoup with Python. Here is the code:

import requests
import pandas as pd
import urllib
from bs4 import BeautifulSoup

tickers = ['HSBA.L', 'RDSA.L', 'RIO.L', 'BP.L', 'GSK.L', 'DGE.L', 'AZN.L', 'VOD.L', 'GLEN.L', 'ULVR.L']
url = 'https://uk.finance.yahoo.com/quote/HSBA.L/history?period1=1478647619&period2=1510183619&interval=1d&filter=history&frequency=1d'

request = requests.get(url)

soup = BeautifulSoup(request.text, 'lxml')

table = soup.find_all('table')[0]

n_rows = 0
n_columns = 0
column_name = []

for row in table.find_all('tr'):

    data = row.find_all('td')
    if len(data) > 0:
        n_rows += 1
        if n_columns == 0:
            n_columns = len(data)

    headers = row.find_all('th')
    if len(headers) > 0 and len(column_name) == 0:
        for header_names in headers:

new_table = pd.DataFrame(columns = column_name, index = range(0,n_rows))

row_index = 0
for row in table.find_all('tr'):
    column_index = 0
    columns = row.find_all('td')

    for column in columns:
        new_table.iat[row_index, column_index] = column.get_text()
        column_index += 1

    if len(columns) > 0:
        row_index += 1    

The first time i ran the code, i had the interval set to exactly two years from November the 7th 2015 (with weekly prices). The issue is that the resulting data frame is 101 rows long but i know for a fact it should be more (106). Then i tried to change the interval completely to the default one when the page opens (which is daily) but i still got the same 101 rows, whereas the actual data is much larger. Is there anything wrong with the code, or is it something Yahoo finance are doing?

Any help is appreciated, i'm really stuck here.

Автор: hhkk Источник Размещён: 08.11.2017 11:42

Ответы (1)

0 плюса

4908 Репутация автора

AFAIK, the API was shut down in May of 2017. Can you use Google finance? If you can accept Ex cel as a solution, here is a link to a file that you can download to download all kinds of historical time series data.


Автор: ryguy72 Размещён: 25.11.2017 09:18
Вопросы из категории :