Parsing HTML

已回答

Hi,  I had a script made that would get the top 100 Baseball America prospects.  They have redone thier site and I can't seem to get a new script to work.  Not sure what I'm doing wrong.  The following is the URL:

https://www.baseballamerica.com/rankings/2023-top-100-prospects/

I'm looking for this:

<a class="rankings-table__tdata-details--item-link" href="https://www.baseballamerica.com/players/8268-elly-de-la-cruz/">1. Elly De La Cruz</a>

<a class="rankings-table__tdata-details--item-link" href="https://www.baseballamerica.com/players/9038-jackson-holliday/">2. Jackson Holliday</a>

Down to:

<a class="rankings-table__tdata-details--item-link" href="https://www.baseballamerica.com/players/8653-brady-house/">100. Brady House</a>

 

The following is the page html for the first 3 players:

<div class="rankings-table__tbody"><div id="0" class="rankings-table__trow"><div class="rankings-table__trow-inner"><div class="rankings-table__tdata-details"><div class="rankings-table__tdata-details--item"><a href="https://www.baseballamerica.com/players/8268-elly-de-la-cruz/" class="rankings-table__tdata-details--item-image-wrapper"><img class="rankings-table__tdata-details--item-image" width="48" height="48" src="https://img.mlbstatic.com/mlb-photos/image/upload/w_48,h_48,g_auto,c_fill/v1/people/682829/headshot/67/current" alt="Headshot of Elly De La Cruz"></a></div><div class="rankings-table__tdata-details--item"><a class="rankings-table__tdata-details--item-link" href="https://www.baseballamerica.com/players/8268-elly-de-la-cruz/">1. Elly De La Cruz</a></div><div class="rankings-table__tdata-details--item">Cincinnati Reds</div><div class="rankings-table__tdata-details--item">SS</div></div><div class="rankings-table__tdata-action"><div class="rankings-table__tdata-action--item"><button class="rankings-table__expand-button" aria-label="Expand"><svg viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" width="24" height="24" aria-hidden="true" focusable="false"><path d="M17.5 11.6L12 16l-5.5-4.4.9-1.2L12 14l4.5-3.6 1 1.2z"></path></svg></button></div></div></div></div><div id="1" class="rankings-table__trow"><div class="rankings-table__trow-inner"><div class="rankings-table__tdata-details"><div class="rankings-table__tdata-details--item"><a href="https://www.baseballamerica.com/players/9038-jackson-holliday/" class="rankings-table__tdata-details--item-image-wrapper"><img class="rankings-table__tdata-details--item-image" width="48" height="48" src="https://img.mlbstatic.com/mlb-photos/image/upload/w_48,h_48,g_auto,c_fill/v1/people/702616/headshot/67/current" alt="Headshot of Jackson Holliday"></a></div><div class="rankings-table__tdata-details--item"><a class="rankings-table__tdata-details--item-link" href="https://www.baseballamerica.com/players/9038-jackson-holliday/">2. Jackson Holliday</a></div><div class="rankings-table__tdata-details--item">Baltimore Orioles</div><div class="rankings-table__tdata-details--item">SS</div></div><div class="rankings-table__tdata-action"><div class="rankings-table__tdata-action--item"><button class="rankings-table__expand-button" aria-label="Expand"><svg viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" width="24" height="24" aria-hidden="true" focusable="false"><path d="M17.5 11.6L12 16l-5.5-4.4.9-1.2L12 14l4.5-3.6 1 1.2z"></path></svg></button></div></div></div></div><div id="2" class="rankings-table__trow"><div class="rankings-table__trow-inner"><div class="rankings-table__tdata-details"><div class="rankings-table__tdata-details--item"><a href="https://www.baseballamerica.com/players/8976-jackson-chourio/" class="rankings-table__tdata-details--item-image-wrapper"><img class="rankings-table__tdata-details--item-image" width="48" height="48" src="https://img.mlbstatic.com/mlb-photos/image/upload/w_48,h_48,g_auto,c_fill/v1/people/694192/headshot/milb/current" alt="Headshot of Jackson Chourio"></a></div><div class="rankings-table__tdata-details--item"><a class="rankings-table__tdata-details--item-link" href="https://www.baseballamerica.com/players/8976-jackson-chourio/">3. Jackson Chourio</a></div><div class="rankings-table__tdata-details--item">Milwaukee Brewers</div><div class="rankings-table__tdata-details--item">OF</div></div><div class="rankings-table__tdata-action"><div class="rankings-table__tdata-action--item"><button class="rankings-table__expand-button" aria-label="Expand"><svg viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" width="24" height="24" aria-hidden="true" focusable="false"><path d="M17.5 11.6L12 16l-5.5-4.4.9-1.2L12 14l4.5-3.6 1 1.2z"></path></svg></button></div></div></div></div>

 

I hope this makes sense to you.  When I run my pycharm code I get a list with a length of zero.

 

 

 

in pycharm My code is :

import requests
from bs4 import BeautifulSoup
html_text = requests.get('https://www.baseballamerica.com/rankings/2023-top-100-prospects/').text
soup = BeautifulSoup(html_text, 'lxml')
my_list = soup.find_all('a', class_="rankings-table__tdata-details--item-link")
print(type(my_list))
print(len(my_list))
print(my_list)
0

Hi,
 
It's possible that they updated their website to avoid scrapping. The `requests.get()` function does not support JavaScript execution; instead, it loads HTML as a static page. Therefore, if the website uses JavaScript to load the rankings data later, you won't be able to scrape it with "requests".
 
While I'm happy to provide this general advice, this question is related to software development more than to the PyCharm IDE itself. For more hands-on help with Python coding, you might consider posting your question on a platform like https://stackoverflow.com/

0

请先登录再写评论。