How to Scrape Instagram (2024)

How to Scrape Instagram (1)

In this Python web scraping tutorial we'll explore Instagram - one of the biggest social media websites out there. We'll take a look at how to scrape Instagram's search and explore endpoints to find user profile data and post information.

We'll also focus on some tips and tricks of how to reach these endpoints efficiently and how to avoid web scraper blocking and access all of this information without having to login to instagram. So, let's dive in!

Setup

In this web scraping Instagram tutorial, we'll be using Python with an HTTP client library httpx which will power all of our interactions with Instagram's server. We can install it via pip command:

$ pip install httpx

That's all we need for this tutorial. We'll mostly be working with JSON objects which we can parse in native Python without any extra packages.

Finding Posts and Users

By Hashtag

How to Scrape Instagram (2)

To find users we can approach many Instagram exploration pages. For example, the most common approach is to use /explore/tags endpoint to find posts by hashtag. Instead of scraping the HTML endpoint, we can use Instagram's GraphQl service:

def scrape_hashtag(hashtag: str, session: httpx.AsyncClient, page_size=12, page_limit: Optional[int] = None): """scrape user's post data""" base_url = "https://www.instagram.com/graphql/query/?query_hash=174a5243287c5f3a7de741089750ab3b&variables=" variables = { "tag_name": hashtag, "first": page_size, "after": None, } page = 1 while True: result = session.get(base_url + quote(json.dumps(variables))) posts = json.loads(result.content)["data"]["hashtag"]["edge_hashtag_to_media"] for post in posts['edges']: yield post["node"] page_info = posts["page_info"] if not page_info["has_next_page"]: break variables["after"] = page_info["end_cursor"] page += 1 if page > page_limit: break
Run Code & Example Output
# Example usage:if __name__ == "__main__": with httpx.Client( timeout=httpx.Timeout(20.0), ) as session: for user in scrape_hashtag("cats", session): print(user)
[ { "comments_disabled": false, "__typename": "GraphImage", "id": "2891447792099336443", "edge_media_to_caption": { "edges": [ { "node": { "text": "🥰\n.\sofinstagram #cats #beautyfullcat #beautifulcatsoftheworld #mycat #prettycat #cats #catsofinstagram #beautifulcatsofinstagram #catoftheday #catstagram #catlife #catlovers #bestmeow #katzen #ilovemycats #ilovemycat #katzenliebe #katzenleben #katzenaufinstagram #katzenfotografie #instacat # katze #katzenwelt #catlove #catfluencer#rescuecat #adoptedcat #adoptedcatsofinstagram #adoptedcatsarethebest" } } ] }, "shortcode": "CggfHKGqyD7", "edge_media_to_comment": { "count": 0 }, "taken_at_timestamp": 1658907458, "dimensions": { "height": 1350, "width": 1080 }, "display_url": "https://scontent-vie1-1.cdninstagram.com/v/t51.2885-15/295609100_475025094450455_8311596005796267513_n.webp?stp=dst-jpg_e35_p1080x1080&_nc_ht=scontent-vie1-1.cdninstagram.com&_nc_cat=111&_nc_ohc=Y-hZeZUhkzYAX_mIOop&edm=AA0rjkIBAAAA&ccb=7-5&oh=00_AT-EeW536WMUxlQ3iG6S-LzW2HoLtmSI0Ss_VIxzZJ4Y-A&oe=62E87315&_nc_sid=d997c6", "edge_liked_by": {"count": 0}, "edge_media_preview_like": {"count": 0}, "owner": {"id": "51742215330"}, "thumbnail_src": "https://scontent-vie1-1.cdninstagram.com/v/t51.2885-15/295609100_475025094450455_8311596005796267513_n.webp?stp=c0.180.1440.1440a_dst-jpg_e35_s640x640_sh0.08&_nc_ht=scontent-vie1-1.cdninstagram.com&_nc_cat=111&_nc_ohc=Y-hZeZUhkzYAX_mIOop&edm=AA0rjkIBAAAA&ccb=7-5&oh=00_AT8rtjj_08vk70Qk4AOEgatMsuAVOOJuk8-FFyKHH0uEKQ&oe=62E87315&_nc_sid=d997c6", "thumbnail_resources": [ { "src": "https://scontent-vie1-1.cdninstagram.com/v/t51.2885-15/295609100_475025094450455_8311596005796267513_n.webp?stp=c0.180.1440.1440a_dst-jpg_e35_s640x640_sh0.08&_nc_ht=scontent-vie1-1.cdninstagram.com&_nc_cat=111&_nc_ohc=Y-hZeZUhkzYAX_mIOop&edm=AA0rjkIBAAAA&ccb=7-5&oh=00_AT8rtjj_08vk70Qk4AOEgatMsuAVOOJuk8-FFyKHH0uEKQ&oe=62E87315&_nc_sid=d997c6", "config_width": 640, "config_height": 640 }, "..." ], "is_video": false, "accessibility_caption": null },]

Above, we are using the GraphQl endpoint which takes in a few variables: tag name, page size and offset. Using these few parameters we can paginate through Instagram hashtag-marked posts and find users (see the owner.id field) or just collect posts themselves!

By Location

How to Scrape Instagram (3)

Alternatively, we can also find posts by location by using /explore/locations REST endpoint. For example, we could find all posts tagged with London location by scraping explore/locations/213385402/london-united-kingdom/?__a=1

Though, for this, we need to know the location's numeric ID. For London, we can see it's 213385402, but how do we find it for any other location?

For this, we need another endpoint - /web/search/topsearch/, which allows us to search top results from a given query. To find the ID of London we'd use URL web/search/topsearch/?query=london which will return us the top user, hashtag and location results matching this query:

"places": [ { "place": { "location": { "pk": "213385402", "short_name": "London", "facebook_places_id": 106078429431815, "external_source": "facebook_places", "name": "London, United Kingdom", "address": "", "city": "", "has_viewer_saved": false, "lng": -0.1094, "lat": 51.5141 }, "title": "London, United Kingdom", "subtitle": "", "media_bundles": [], "slug": "london-united-kingdom" }, "position": 51 } ],

We can see the location ID is under pk or facebook_places_id fields (which are interchangeable in this scenario).
Let's put this together in Python:

import httpxdef find_location_id(query: str, session: httpx.Client): """finds most likely location ID from given location name""" resp = session.get(f"https://www.instagram.com/web/search/topsearch/?query={query}") data = resp.json() try: first_result = sorted(data["places"], key=lambda place: place["position"])[0] return first_result["place"]["location"]["pk"] except IndexError: print(f'no locations matching query "{query}" were found') returndef scrape_users_by_location(location_id: str, session: httpx.Client, page_limit=None): url = f"https://www.instagram.com/explore/locations/{location_id}/?__a=1" page = 1 next_id = "" while True: resp = session.get(url + (f"&max_id={next_id}" if next_id else "")) data = resp.json()["native_location_data"] print(f"scraped location {location_id} page {page}") for section in data["recent"]["sections"]: for media in section["layout_content"]["medias"]: yield media["media"]["user"]["username"] next_id = data["recent"]["next_max_id"] if not next_id: print(f"no more results after page {page}") break if page_limit and page_limit < page: print(f"reached page limit {page}") break page += 1
Run Code & Example Output
if __name__ == "__main__": with httpx.Client( timeout=httpx.Timeout(20.0) ) as session: location_name = "London" location_id = find_location_id(location_name, session=session) print(f'resolved location id from {location_name} to {location_id}') for username in scrape_users_by_location(location_id, session=session): print(username)
[ "username1", "username2", "username3", "..."]

In the example above, we created two functions that defined the logic we've described earlier: one to retrieve location ID from location string and another to retrieve all usernames of recent posts tagged with this location.

note: there's a lot more information in recent post data than just the usernames, we just kept it brief for example purposes but post images, captions and even comment information can be found there.

Scraping User Data

How to Scrape Instagram (4)

To retrieve Instagram user's profile page data we can use internal API endpoint:

def scrape_user(username: str, session: ScrapflyClient): """scrape user's data""" result = session.scrape(ScrapeConfig( url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}", headers={"x-ig-app-id": "936619743392459"}, asp=True )) data = json.loads(result.content) return data['data']['user']
Run Code & Example Output
if __name__ == "__main__": with httpx.Client( timeout=httpx.Timeout(20.0), ) as session: user = scrape_user("google", session)

This approach will return Instagram user data such as bio description, follower counts, profile pictures etc:

{ "biography": "Google unfiltered—sometimes with filters.", "external_url": "https://linkin.bio/google", "external_url_linkshimmed": "https://l.instagram.com/?u=https%3A%2F%2Flinkin.bio%2Fgoogle&e=ATOaH1Vrx_TkkMUhpCCh1_PM-C1k5t35gAtJ0eBjTPE84RItj-cCFdqRoRHwlbiCSrB5G_v6MgjePl1SQN4vTw&s=1", "edge_followed_by": { "count": 13015078 }, "fbid": "17841401778116675", "edge_follow": { "count": 33 }, "full_name": "Google", "highlight_reel_count": 5, "id": "1067259270", "is_business_account": true, "is_professional_account": true, "is_supervision_enabled": false, "is_guardian_of_viewer": false, "is_supervised_by_viewer": false, "is_embeds_disabled": false, "is_joined_recently": false, "guardian_id": null, "is_verified": true, "profile_pic_url": "https://instagram.furt1-1.fna.fbcdn.net/v/t51.2885-19/126151620_3420222801423283_6498777152086077438_n.jpg?stp=dst-jpg_s150x150&_nc_ht=instagram.furt1-1.fna.fbcdn.net&_nc_cat=1&_nc_ohc=bmDCZ2Q8wTkAX-Ilbqq&edm=ABfd0MgBAAAA&ccb=7-4&oh=00_AT9pRKzLtnysPjhclN6TprCd9FBWo2ABbn9cRICPhbQZcA&oe=62882D44&_nc_sid=7bff83", "username": "google", ...}

This is a great, easy method to scrape Instagram profiles - it even includes the details of the first 12 posts including photos and videos!

That being said, to retrieve the rest of the post details and post comments we need to take a look at another endpoint that allows access to the whole post history.

Scraping User Posts

To retrieve the user's posts and post comments, we'll be using yet another GraphQl endpoint that requires three variables: the user's ID which we got from scraping the user's profile previously, page size and page offset cursor:

{ "id": "NUMERIC USER ID", "first": 12, "after": "CURSOR ID FOR PAGING"}

For example, if we would like to retrieve instagram posts create by Google we first have to retrieve this user's ID and then compile our graphql request.

How to Scrape Instagram (5)

In Google's example, the graphql URL would be:

https://www.instagram.com/graphql/query/?query_hash=e769aa130647d2354c40ea6a439bfc08&variables={id:1067259270,first: 12}

We can try in our browser, and we should see a JSON returned with the data of the first 12 posts which include details like:

  • Post photos and videos
  • The first page of post's comments
  • Post metadata such as view and comment counts

However, to retrieve all posts we need to implement pagination logic as all of the information is scattered through multiple pages.

import jsonfrom urllib.parse import quotedef scrape_user_posts(user_id: str, session: httpx.Client, page_size=12): base_url = "https://www.instagram.com/graphql/query/?query_hash=e769aa130647d2354c40ea6a439bfc08&variables=" variables = { "id": user_id, "first": page_size, "after": None, } while True: resp = session.get(base_url + quote(json.dumps(variables))) posts = resp.json()["data"]["user"]["edge_owner_to_timeline_media"] for post in posts["edges"]: yield post["node"] page_info = posts["page_info"] if not page_info["has_next_page"]: break variables["after"] = page_info["end_cursor"]
Run Code & Example Output
import jsonimport httpxif __name__ == "__main__": with httpx.Client(timeout=httpx.Timeout(20.0)) as session: posts = list(scrape_user_posts("1067259270", session, page_limit=3)) print(json.dumps(posts, indent=2, ensure_ascii=False))
[ { "__typename": "GraphImage", "id": "2890253001563912589", "dimensions": { "height": 1080, "width": 1080 }, "display_url": "https://scontent-atl3-2.cdninstagram.com/v/t51.2885-15/295343605_719605135806241_7849792612912420873_n.webp?stp=dst-jpg_e35&_nc_ht=scontent-atl3-2.cdninstagram.com&_nc_cat=101&_nc_ohc=cbVYU-YGD04AX9-DGya&edm=APU89FABAAAA&ccb=7-5&oh=00_AT-C93CjLzMapgPHOinoltBXypU_wi7s6zzLj1th-s9p-Q&oe=62E80627&_nc_sid=86f79a", "display_resources": [ { "src": "https://scontent-atl3-2.cdninstagram.com/v/t51.2885-15/295343605_719605135806241_7849792612912420873_n.webp?stp=dst-jpg_e35_s640x640_sh0.08&_nc_ht=scontent-atl3-2.cdninstagram.com&_nc_cat=101&_nc_ohc=cbVYU-YGD04AX9-DGya&edm=APU89FABAAAA&ccb=7-5&oh=00_AT8aF_4X2Ix9neTg1obSzOBgZW83oMFSNb-i5uqZqRqLLg&oe=62E80627&_nc_sid=86f79a", "config_width": 640, "config_height": 640 }, "..." ], "is_video": false, "tracking_token": "eyJ2ZXJzaW9uIjo1LCJwYXlsb2FkIjp7ImlzX2FuYWx5dGljc190cmFja2VkIjp0cnVlLCJ1dWlkIjoiOWJiNzUyMjljMjU2NDExMTliOGI4NzM5MTE2Mjk4MTYyODkwMjUzMDAxNTYzOTEyNTg5In0sInNpZ25hdHVyZSI6IiJ9", "edge_media_to_tagged_user": { "edges": [ { "node": { "user": { "full_name": "Jahmar Gale | Data Analyst", "id": "51661809026", "is_verified": false, "profile_pic_url": "https://scontent-atl3-2.cdninstagram.com/v/t51.2885-19/284007837_5070066053047326_6283083692098566083_n.jpg?stp=dst-jpg_s150x150&_nc_ht=scontent-atl3-2.cdninstagram.com&_nc_cat=106&_nc_ohc=KXI8oOdZRb4AX8w28nr&edm=APU89FABAAAA&ccb=7-5&oh=00_AT-4iYsawdTCHI5a2zD_PF9F-WCyKnTIPuvYwVAQo82l_w&oe=62E7609B&_nc_sid=86f79a", "username": "datajayintech" }, "x": 0.68611115, "y": 0.32222223 } }, "..." ] }, "accessibility_caption": "A screenshot of a tweet from @DataJayInTech, which says: \"A recruiter just called me and said The Google Data Analytics Certificate is a good look. This post is to encourage YOU to finish the course.\" The background of the image is red with white, yellow, and blue geometric shapes.", "edge_media_to_caption": { "edges": [ { "node": { "text": "Ring, ring — opportunity is calling📱\nStart your Google Career Certificate journey at the link in bio. #GrowWithGoogle" } }, "..." ] }, "shortcode": "CgcPcqtOTmN", "edge_media_to_comment": { "count": 139, "page_info": { "has_next_page": true, "end_cursor": "QVFCaU1FNGZiNktBOWFiTERJdU80dDVwMlNjTE5DWTkwZ0E5NENLU2xLZnFLemw3eTJtcU54ZkVVS2dzYTBKVEppeVpZbkd4dWhQdktubW1QVzJrZXNHbg==" }, "edges": [ { "node": { "id": "18209382946080093", "text": "@google your company is garbage for meddling with supposedly fair elections...you have been exposed", "created_at": 1658867672, "did_report_as_spam": false, "owner": { "id": "39246725285", "is_verified": false, "profile_pic_url": "https://scontent-atl3-2.cdninstagram.com/v/t51.2885-19/115823005_750712482350308_4191423925707982372_n.jpg?stp=dst-jpg_s150x150&_nc_ht=scontent-atl3-2.cdninstagram.com&_nc_cat=104&_nc_ohc=4iOCWDHJLFAAX-JFPh7&edm=APU89FABAAAA&ccb=7-5&oh=00_AT9sH7npBTmHN01BndUhYVreHOk63OqZ5ISJlzNou3QD8A&oe=62E87360&_nc_sid=86f79a", "username": "bud_mcgrowin" }, "viewer_has_liked": false } }, "..." ] }, "edge_media_to_sponsor_user": { "edges": [] }, "comments_disabled": false, "taken_at_timestamp": 1658765028, "edge_media_preview_like": { "count": 9251, "edges": [] }, "gating_info": null, "fact_check_overall_rating": null, "fact_check_information": null, "media_preview": "ACoqbj8KkijDnBOfpU1tAkis8mcL2H0zU8EMEqh1Dc56H0/KublclpoejKoo3WtylMgQ4HeohW0LKJ+u7PueaX+z4v8Aa/OmoNJJ6kqtG3UxT0pta9xZRxxswzkDjJrIoatuawkpq6NXTvuN9f6VdDFeAMAdsf8A16oWDKFYMQMnuR6e9Xd8f94fmtax2OGqnzsk3n/I/wDsqN7f5H/2VR74/wC8PzWlEkY7g/iv+NVcys+wy5JML59P89zWDW3dSx+UwGMnjjH9KxKynud1BWi79wpQM+g+tJRUHQO2+4pCuO4pKKAFFHP+RSUUgP/Z", "owner": { "id": "1067259270", "username": "google" }, "location": null, "viewer_has_liked": false, "viewer_has_saved": false, "viewer_has_saved_to_collection": false, "viewer_in_photo_of_you": false, "viewer_can_reshare": true, "thumbnail_src": "https://scontent-atl3-2.cdninstagram.com/v/t51.2885-15/295343605_719605135806241_7849792612912420873_n.webp?stp=dst-jpg_e35_s640x640_sh0.08&_nc_ht=scontent-atl3-2.cdninstagram.com&_nc_cat=101&_nc_ohc=cbVYU-YGD04AX9-DGya&edm=APU89FABAAAA&ccb=7-5&oh=00_AT8aF_4X2Ix9neTg1obSzOBgZW83oMFSNb-i5uqZqRqLLg&oe=62E80627&_nc_sid=86f79a", "thumbnail_resources": [ { "src": "https://scontent-atl3-2.cdninstagram.com/v/t51.2885-15/295343605_719605135806241_7849792612912420873_n.webp?stp=dst-jpg_e35_s150x150&_nc_ht=scontent-atl3-2.cdninstagram.com&_nc_cat=101&_nc_ohc=cbVYU-YGD04AX9-DGya&edm=APU89FABAAAA&ccb=7-5&oh=00_AT9nmASHsbmNWUQnwOdkGE4PvE8b27MqK-gbj5z0YLu8qg&oe=62E80627&_nc_sid=86f79a", "config_width": 150, "config_height": 150 }, "..." ]},...]

Building a Profile - Hashtag Mentions

Now that we can scrape all user posts, we can do a common analytics exercise: scrape all posts and extract hashtag mentions.

For this, let's scrape all posts, extract mentioned hashtags from the post description and count everything up:

from collections import Counterdef scrape_hashtag_mentions(user_id, session: httpx.AsyncClient, page_limit:int=None): """find all hashtags user mentioned in their posts""" hashtags = Counter() hashtag_pattern = re.compile(r"#(\w+)") for post in scrape_user_posts(user_id, session=session, page_limit=page_limit): desc = post['edge_media_to_caption']['edges'][0]['node']['text'] found = hashtag_pattern.findall(desc) for tag in found: hashtags[tag] += 1 return hashtags
Run Code & Example Output
import jsonimport httpxif __name__ == "__main__": with httpx.Client(timeout=httpx.Timeout(20.0)) as session: # if we only know the username but not user id we can scrape # the user profile to find the id: user_id = scrape_user("google")["id"] # will result in: 1067259270 # then we can scrape the hashtag profile hashtags = scrape_hastag_mentions(user_id, session, page_limit=5) # order results and print them as JSON: print(json.dumps(dict(hashtags.most_common()), indent=2, ensure_ascii=False))
{ "MadeByGoogle": 10, "TeamPixel": 5, "GrowWithGoogle": 4, "Pixel7": 3, "LifeAtGoogle": 3, "SaferWithGoogle": 3, "Pixel6a": 3, "DoodleForGoogle": 2, "MySuperG": 2, "ShotOnPixel": 1, "DayInTheLife": 1, "DITL": 1, "GoogleAustin": 1, "Austin": 1, "NestWifi": 1, "NestDoorbell": 1, "GoogleATAPAmbientExperiments": 1, "GoogleATAPxKOCHE": 1, "SoliATAP": 1, "GooglePixelWatch": 1, "Chromecast": 1, "DooglersAroundTheWorld": 1, "GoogleSearch": 1, "GoogleSingapore": 1, "InternationalDogDay": 1, "Doogler": 1, "BlackBusinessMonth": 1, "PixelBuds": 1, "HowTo": 1, "Privacy": 1, "Settings": 1, "GoogleDoodle": 1, "NationalInternDay": 1, "GoogleInterns": 1, "Sushi": 1, "StopMotion": 1, "LetsInternetBetter": 1}

With this simple analytics script, we've collected profile hashtags that we can use to determine the interestest of any public Instagram account.

With this last piece of code, we're able to find users through the location or hashtag usage and scrape their profile data as well as all of their posts. To scale this scraper up let's take a look at how to avoid being blocked with ScrapFLy next.

Blocking / Login Requirement

Scraping Instagram seems to be easy though unfortunately, Instagram started restricting public access to its public data. Often allowing users few requests per hour and for anything more requiring a login.

How to Scrape Instagram (6)

To get around this, let's take advantage of ScrapFly API which can avoid all of these blocks for us!

How to Scrape Instagram (7)

Which offers several powerful features that'll help us to get around Instagram's blocking:

  • Anti Scraping Protection Bypass
  • Javascript Rendering
  • 190M Pool of Residential or Mobile Proxies

For this, we'll be using scrapfly-sdk python package and ScrapFly's anti scraping protection bypass feature. First, let's install scrapfly-sdk using pip:

$ pip install scrapfly-sdk

To take advantage of ScrapFly's API in our Instagram web scraper all we need to do is replace httpx requests with scrapfly-sdk requests. Let's take a look at full scraper code with ScrapFly integration

Full Scraper Code

Here's the final instagram scraper code we covered in this tutorial. Our scraper covers how to extract data from instagram profiles and posts, as well as how to find instagram users and posts:

Full Scraper Code with ScrapFly
import jsonfrom typing import Optionalfrom urllib.parse import quotefrom scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponsedef find_location_id(query: str, session: ScrapflyClient): """finds most likely location ID from given location name""" result = session.scrape( ScrapeConfig( f"https://www.instagram.com/web/search/topsearch/?query={query}", asp=True, proxy_pool="public_residential_pool", country="US", ) ) data = json.loads(result.content) try: first_result = sorted(data["places"], key=lambda place: place["position"])[0] return first_result["place"]["location"]["pk"] except IndexError: print(f'no locations matching query "{query}" were found') returndef scrape_users_by_location(location_id: str, session: ScrapflyClient, page_limit: Optional[int] = None): url = f"https://www.instagram.com/explore/locations/{location_id}/?__a=1" page = 1 next_id = "" while True: resp = session.scrape( ScrapeConfig(url + (f"&max_id={next_id}" if next_id else ""), asp=True) ).upstream_result_into_response() data = resp.json()["native_location_data"] print(f"scraped location {location_id} page {page}") for section in data["recent"]["sections"]: for media in section["layout_content"]["medias"]: yield media["media"]["user"]["username"] next_id = data["recent"]["next_max_id"] if not next_id: print(f"no more results after page {page}") break if page_limit and page_limit < page: print(f"reached page limit {page}") break page += 1def scrape_user(username: str, session: ScrapflyClient): """scrape user's data""" result = session.scrape( ScrapeConfig( url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}", headers={"x-ig-app-id": "936619743392459"}, asp=True, ) ) data = json.loads(result.content) return data["data"]["user"]def scrape_user_posts(user_id: str, session: ScrapflyClient, page_size=12, page_limit: Optional[int] = None): """scrape user's post data""" base_url = "https://www.instagram.com/graphql/query/?query_hash=e769aa130647d2354c40ea6a439bfc08&variables=" variables = { "id": user_id, "first": page_size, "after": None, } page = 1 while True: result = session.scrape(ScrapeConfig(base_url + quote(json.dumps(variables)), asp=True)) posts = json.loads(result.content)["data"]["user"]["edge_owner_to_timeline_media"] for post in posts["edges"]: yield post["node"] page_info = posts["page_info"] if not page_info["has_next_page"]: break variables["after"] = page_info["end_cursor"] page += 1 if page > page_limit: breakdef scrape_hashtag_mentions(user_id, session: ScrapflyClient, page_limit:Optional[int]=None): """find all hashtags user mentioned in their posts""" hashtags = Counter() hashtag_pattern = re.compile(r"#(\w+)") for post in scrape_user_posts(user_id, session=session, page_limit=page_limit): desc = post['edge_media_to_caption']['edges'][0]['node']['text'] found = hashtag_pattern.findall(desc) print(found) for tag in found: hashtags[tag] += 1 return hashtagsdef scrape_hashtag(hashtag: str, session: ScrapflyClient, page_size=12, page_limit: Optional[int] = None): """scrape user's post data""" base_url = "https://www.instagram.com/graphql/query/?query_hash=174a5243287c5f3a7de741089750ab3b&variables=" variables = { "tag_name": hashtag, "first": page_size, "after": None, } page = 1 while True: result = session.scrape(ScrapeConfig(base_url + quote(json.dumps(variables)), asp=True)) posts = json.loads(result.content)["data"]["hashtag"]["edge_hashtag_to_media"] for post in posts["edges"]: yield post["node"] page_info = posts["page_info"] if not page_info["has_next_page"]: break variables["after"] = page_info["end_cursor"] page += 1 if page > page_limit: breakif __name__ == "__main__": with ScrapflyClient(key="YOUR_SCRAPFLY_KEY", max_concurrency=2) as session: result_location = find_location_id("London, United Kingdom", session) result_location_users = list(scrape_users_by_location(result_location, session, page_limit=3)) result_hashtag_users = list(scrape_hashtag("webscraping", session, page_limit=3)) result_user = scrape_user("google", session) result_user_posts = list(scrape_user_posts(result_user["id"], session, page_limit=3)) print("done")

In the example above we're using ScrapFly's Anti Bot Protection Bypass feature to get around Instagram's login requirement. To enable this all we had to do is replace a few lines of code and every Instagram page could be accessed without logging in!

FAQ

To wrap this guide up let's take a look at some frequently asked questions about web scraping instagram.com:

Is web scraping instagram.com legal?

Yes. Instagram's data is publicly available so scraping instagram.com at slow, respectful rates would fall under the ethical scraping definition. However, when working with personal data we need to be aware of local copyright and user data laws like GDPR in the EU. For more see our Is Web Scraping Legal? article.

How to get Instagram user ID from username?

To get the private user ID from the public username we can scrape user profile using our scrape_user function and the private id will be located in the id field:

with httpx.Client(timeout=httpx.Timeout(20.0)) as session: user_id = scrape_user('google')['id'] print(user_id)

How to get Instagram username from user ID?

To get the public username from Instagram's private user ID we can take advantage of public iPhone API https://i.instagram.com/api/v1/users/<USER_ID>/info/:

import httpxiphone_api = "https://i.instagram.com/api/v1/users/{}/info/"iphone_user_agent = "Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_3 like Mac OS X) AppleWebKit/603.3.8 (KHTML, like Gecko) Mobile/14G60 Instagram 12.0.0.16.90 (iPhone9,4; iOS 10_3_3; en_US; en-US; scale=2.61; gamut=wide; 1080x1920"resp = httpx.get(iphone_api.format("1067259270"), headers={"User-Agent": iphone_user_agent})print(resp.json()['user']['username'])

Magic parameter __a=1 is no longer working?

Instagram has been rolling out new changes and slowly retiring this feature. However, in this article we've covered two alternatives for ?__a=1 features which are the /v1/ API endpoints and GraphQl endpoints which perform even better!

Summary

In this Instagram scraping tutorial, we've taken a look at how to find Instagram posts and users using hashtag or location lookup, how to scrape user's profile and post data. For this, we used multiple public API and GraphQl endpoints that generate even more data than we can see on the page itself!

Finally, to start scaling the scraper we took a look at how to scrape Instagram without login by taking advantage of ScrapFly's smart scraper blocking bypass systems. For more on ScrapFly see our documentation and try it out for free!

How to Scrape Instagram (2024)

FAQs

Is it possible to scrape Instagram? ›

Scraping publicly available data is legal, but you need to be careful not to extract content that is protected by copyright or contains personal information. So, after scraping Instagram, double-check your output for data that would go against GDPR, CCPA, or could be considered intellectual property.

How do you scrape on Instagram without being banned? ›

10 Tips For Web Scraping Without Getting Blocked/Blacklisted
  1. IP Rotation. ...
  2. Set a Real User Agent. ...
  3. Set Other Request Headers. ...
  4. Set Random Intervals In Between Your Requests. ...
  5. Set a Referrer. ...
  6. Use a Headless Browser. ...
  7. Avoid Honeypot Traps. ...
  8. Detect Website Changes.

What is the best Instagram data extractor? ›

Our first top recommendation is Smartproxy. With their Web Scraping API, you can easily automate web scraping Instagram data, From profiles, usernames, posts, photos' URLs, or hashtags! This way, you can collect any data you want, including your target influencers.

How do I extract data from Instagram? ›

You'll need to download your data to review it.
  1. Tap your profile picture in the bottom right to go to your profile.
  2. Tap at the top.
  3. Tap Privacy and security.
  4. Scroll down to Data download and tap Request download.
  5. Enter the email address where you'd like to receive a link to your data.

How to fetch Instagram data using Python? ›

To set up the Instagram data fetching tool, you need to import the Instaloader Python library and create an instance of the Instaloader class. After that, you need to provide the Instagram handle of the profile from which you want to extract the data.

How do you extract text from Instagram? ›

Android Device
  1. Once you have it on your phone, go to your Instagram.
  2. Scroll through your feed until you locate the post. Take a screenshot.
  3. Click the Google Lens to launch the app and select the image you've just captured.
  4. Select the text you want to copy and tap Copy text.
Aug 8, 2022

Can web scraping be detected? ›

Web pages detect web crawlers and web scraping tools by checking their IP addresses, user agents, browser parameters, and general behavior. If the website finds it suspicious, you receive CAPTCHAs and then eventually your requests get blocked since your crawler is detected.

How do you not get blocked while scraping? ›

5 ways of web scraping without getting blocked

proxies. Switch user agents. Solving captcha services or feature. Slow down the scrape.

Does Instagram ban NSFW? ›

Post photos and videos that are appropriate for a diverse audience. We know that there are times when people might want to share nude images that are artistic or creative in nature, but for a variety of reasons, we don't allow nudity on Instagram.

Is there a free Instagram analyzer? ›

If you have a business account on Instagram, you automatically have access to their free analytics tool, Instagram Insights. The Insights tool can show you when your audience is on Instagram, which of your posts are most popular, and your account's impressions and reach.

How can I beat Instagram algorithm? ›

6 Ways to "Beat" the Instagram Algorithm in 2023
  1. Consistently Share Instagram Reels.
  2. Encourage Interactions with Instagram Stories Stickers.
  3. Drive Conversations with Engaging Captions.
  4. Add Hashtags and Keywords to Your Posts.
  5. Cross-promote Your Instagram Content.
  6. Use Instagram Analytics to See What's Working.
Dec 22, 2022

How do you scrape Instagram for free? ›

How to scrape Instagram posts
  1. Create a free Apify account.
  2. Open Instagram Post Scraper.
  3. Add one or more Instagram usernames to scrape.
  4. Click "Save & Start" and wait for the datasets to be extracted.
  5. Download your data in JSON, XML, CSV, Excel, or HTML.

How to scrape Instagram using Selenium? ›

Web Scraping Instagram with Selenium
  1. Login to out personal Instagram account.
  2. Handle the pop-up messages by clicking on “not now”
  3. Search for a keyword “#cat”
  4. Scroll down and select all the above thumbnails.
  5. Create a new directory on your computer.
  6. Save all the images inside the new directory.
Nov 15, 2020

Does Instagram have an API? ›

The Instagram Graph API allows you to connect your app to Instagram's features and functionalities. Instagram Businesses and Creators can use this API to fully manage their presence on Instagram, including finding mentions, getting basic data on other Businesses, and finding hashtagged photos.

What is the usage of Instaloader? ›

The instaloader module can be used to download everything of profile/Instagram user, you need to interrupt by CONTROL+C to kill the process. However, for downloading files of a private account, you must log in, there is no compulsion for a public account. Also, comments are in json file which is zipped as a folder.

What is Instapy? ›

Tooling that automates your social media interactions to “farm” Likes, Comments, and Followers on Instagram Implemented in Python using the Selenium module.

How do I get fetch API for Instagram? ›

How to Connect to the Instagram API
  1. Go to Instagram developer page.
  2. Click on Register Your Application.
  3. Click on Register a New Client.
  4. Fill the form and click on Register.
  5. Go to Clients manager.
  6. Click on Manage in your application block.
  7. Copy and save your credentials: Client ID and Client Secret.
Apr 16, 2021

How to use Instagramy Python? ›

Sample Usage
  1. Login into Instagram in default webbrowser.
  2. Move to Developer option.
  3. Copy the sessionid. Move to storage and then to cookies and copy the sessionid (Firefox) Move to Application and then to storage and then to cookies and copy the sessionid (Chrome)

Can you download Instagram texts? ›

You can download your data on Instagram by accessing the app's security settings, and requesting the data from Instagram. Downloading your Instagram data allows you to export photos, videos, archived stories, comments, messages and more. You can download your data from either the Instagram app or the website.

How do you get Instagram chat transcripts? ›

How to retrieve and download your Instagram conversation history?
  1. Step 1 – Going to the download page of your Instagram data. Login to your Instagram account. ...
  2. Step 2 - Choosing Instagram download options. ...
  3. Step 3 - Instagram data processing. ...
  4. Step 4 (optional) - - Keeping a paper backup of your Instagram data.

Is web scraping a crime? ›

Though web scraping can be legal, being scraped is not desired by companies. If these platforms can show that being scraped by a bot damages their infrastructure or operations, then that activity may be found illegal by the court.

Should I use a VPN when web scraping? ›

Where proxies provide a layer of protection by masking the IP address of your web scraper, a VPN also masks the data that flows between your scraper and the target site through an encrypted tunnel. This will make the content that you are scraping invisible to ISPs and anyone else with access to your network.

Is web scraping frowned upon? ›

However, the practice of web scraping is often frowned upon, as it's sometimes misused.

How do you detect scraping? ›

Using fingerprinting to detect web scraping

Application Security Manager (ASM) can identify web scraping attacks on web sites that ASM protects by using information gathered about clients through fingerprinting or persistent identification.

Can you be banned from scraping? ›

An Introduction to Scraping Bans

Some common causes of scraping blocks include: Captchas and other 'humanity' tests. WebRTC and canvas fingerprinting. TCP/IP fingerprinting, geofencing and IP blocking.

Is scraping forbidden? ›

Web scraping is not an illegal activity, but that doesn't mean you can scrape any site you want. There are some sites that explicitly block any sort of automated data extraction either via the robots. txt file or their Terms of service page.

How do I get 18 content on Instagram? ›

Tap the Settings menu in the upper right corner. Tap Account. Tap Sensitive Content Control. Here you can decide whether to keep the setting at its default state (“Standard”) or to see more (“More”) or less of some types of sensitive content (“Less”).

Is it legal to copy Instagram posts? ›

Specifically, Instagram says: Your content, assuming a basic level of creativity in its making, is yours and protected under copyright law, even if it only exists on the Instagram platform. The same goes for anyone else's content.

How do you scrape someone on Instagram for free? ›

How to scrape Instagram posts
  1. Create a free Apify account.
  2. Open Instagram Post Scraper.
  3. Add one or more Instagram usernames to scrape.
  4. Click "Save & Start" and wait for the datasets to be extracted.
  5. Download your data in JSON, XML, CSV, Excel, or HTML.

Is it illegal to copy Instagram photos? ›

This will likely be a copyright infringement, unless you have the owner's permission. The question, therefore, is whether or not the owner of the image posted on Instagram has given permission for the work to be reproduced.

Can you scrape Instagram followers? ›

Our Instagram Followers Count Scraper allows you to scrape the number of Followers and Followings from any Instagram profile. Just add one or more Instagram usernames to get the data.

Is it legal to take screenshot on Instagram? ›

Yes. There are no restrictions on the Instagram app for taking screenshots of anything you can see in your feed.

Can someone sue you for an Instagram post? ›

The simple answer is yes, you can potentially sue someone from Instagram libel. But lawsuits are frequently expensive and rarely simple, so it is best to consider litigation alternatives before diving into a lawsuit.

Can you get sued for posting on Instagram? ›

An Instagram comment that attacks or defames the reputation or character of another person or business may be grounds for a defamation of character lawsuit.

How can I see stalkers on Instagram for free? ›

Methods to discover Instagram stalkers
  1. Check your Instagram profile interactions. The first and easiest thing to do is simply to check your profile interactions. ...
  2. Check who views your Instagram stories. ...
  3. Check your Instagram followers. ...
  4. Use Instagram Insights.
Aug 10, 2022

Is getting free followers on Instagram illegal? ›

No. Instagram does NOT ban accounts for fake followers. It is NOT against Instagram's terms of service, and they tolerate the practice because everyone is doing it (even celebrities like Justin Bieber and Lady Gaga, and major brands like American Apparel).

Is there anything illegal on Instagram? ›

Instagram is not a place to support or praise terrorism, organized crime, or hate groups. Offering sexual services, buying or selling firearms, alcohol, and tobacco products between private individuals, and buying or selling non-medical or pharmaceutical drugs are also not allowed.

Who is Instagram owned by? ›

On April 9, 2012, Facebook, Inc. (now Meta Platforms) bought Instagram for $1 billion in cash and stock, with a plan to keep the company independently managed.

Is a photo still copyright if you edit it? ›

If you edit an image that you didn't create, copyright law still applies. The only way to avoid copyright infringement with images is to create unique works, purchase a license to use an image or find a free-to-use image.

Does removing fake followers help? ›

There is no downside to manually working through your follower list. Remove followers who are fake and inactive, whilst increasing the engagement and visibility of your profile across the platform. Your content will perform better, you'll receive more authentic engagement, and you'll have the algorithm on your side.

Is buying followers on Instagram Bannable? ›

No. This is a myth. Buying followers will NOT get you banned, ever. Millions of users are purchasing followers every year to boost their number of followers, and no one ever gets banned for doing so.

Top Articles
Latest Posts
Article information

Author: Van Hayes

Last Updated:

Views: 6391

Rating: 4.6 / 5 (46 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Van Hayes

Birthday: 1994-06-07

Address: 2004 Kling Rapid, New Destiny, MT 64658-2367

Phone: +512425013758

Job: National Farming Director

Hobby: Reading, Polo, Genealogy, amateur radio, Scouting, Stand-up comedy, Cryptography

Introduction: My name is Van Hayes, I am a thankful, friendly, smiling, calm, powerful, fine, enthusiastic person who loves writing and wants to share my knowledge and understanding with you.