Python script for crawling API stops for some reason - make suggestions for improvement

  • Status: Closed
  • Prêmio: $20
  • Inscrições Recebidas: 3
  • Vencedor: RedLayers

Síntese do concurso

Dear all,

We're using the below script for making requests with the crawling provider proxycrawl.com (the documentation can be accessed here after having created a free account: https://proxycrawl.com/dashboard/docs).

The script is working well in general, however with one problem remaining: It simply stops working from time to time - sometimes after having successfully crawled a couple of hundred, sometimes only after a couple of thousand URLs. But we can't get it stable to crawl a couple of 10k URLs.

Please make suggestions right in the code - including a comment that describes why you made the change. We'll then test it and award the amount if the change brings the desired result.

Looking forward to your contributions!

Habilidades Recomendadas

Feedback do Empregador

“Mario is a great guy and a pleasure to work with!”

Foto do perfil thomasjohn6, Germany.

Principais inscrições deste concurso

Ver Mais Inscrições

Painel de Comentários

  • imo581
    imo581
    • 4 anos atrás

    I tried your scripts with some links. The API responds with status code 403 Forbidden. I tried to use the API using a browser and it gives me this message "Token is invalid or account is temporarily blocked! please login to your dashboard for more details". Is something wrong with your subscription?

    • 4 anos atrás
    1. thomasjohn6
      Proprietário do Concurso
      • 4 anos atrás

      Hello Islam, Thanks for your interest in the contest! I guess for somewhat obvious reasons, before posting the script in public, I removed the real token from the script :-)

      • 4 anos atrás
  • busygayan
    busygayan
    • 4 anos atrás

    Literally makes no sense for you to pay a third party service which costs you money, and their prices are pretty expensive.

    Why don't you create your own tiny system which can get this done ?
    It's nothing complicated.

    • 4 anos atrás
    1. busygayan
      busygayan
      • 4 anos atrás

      So 40 Bucks plus you need a sever which can handle 50K plain requests per an hour ? So to answer the question

      Proxy crawl cost - 2500 USD ( basic, not JavaScript )
      Custom approach cost - less than 400 USD ( with a 64GB / 16 vCPUs Server )

      Javascript based crawl on proxy crawl - $5,054.90
      Custom approach cost - less than 1000 USD ( 192 GB of ram , 32 vCPUs Server )

      Besides all that, the code is custom, its transparent and debugging is way easy.
      Your data is private.

      • 4 anos atrás
    2. busygayan
      busygayan
      • 4 anos atrás

      I have a bot which crawls facebook daily with over 1,000 concurrent accounts daily. custom coded using selenium with python and i make over 100 requests each second ( each request has its own unique IP / proxy ). Still i spend only around 2,000 on a monthly basis,

      This makes no sense and the customer is being technically ripped off, paying almost 5x the amount. Still the customer is stuck having to debug his own code, I'm not even going to go why the code fails. You could pay a couple of engineer a salary and have your own servers maintained with 0 issues for the amount that you spend on this company. even if you're doing this on a small scale, makes no sense.

      High cohesion is not bad at all, that's my point basically.
      Good Luck

      • 4 anos atrás
  • thomasjohn6
    Proprietário do Concurso
    • 4 anos atrás

    Thanks for your comment! However, for now we would like to use the convenience of such a provider. Maybe later do it on our own. So do you have any idea what the problem could be in the script? Thanks in advance!

    • 4 anos atrás

Mostrar mais comentários

Como começar com concursos

  • Publique seu concurso

    Publique seu Concurso Rápido e fácil

  • Receba muitas inscrições

    Obtenha Toneladas de Inscrições De todo o mundo

  • Premie a melhor inscrição

    Premie a melhor inscrição Baixe os arquivos, é fácil!

Poste um Concurso ou Junte-se a nós Hoje!