Scrape website sitemaps and test all the internal links inside scraped pages for status 200 or status 302 response
$10-30 USD
Fechado
Publicado há quase 9 anos
$10-30 USD
Pago na entrega
Hi,
We are developing a deployment strategy and part of it involves the testing of all the links inside our pre-production website for 404 errors or other types of stuff.
We would like run a to scraper which would helps us test the following sequence:
1. Given website test.mywebsite.com..
- Does it have sitemap?
- Does it have multiple sitemaps included?
- Open all sitemaps file and scrape each link
2. For all the links and images from the sitemaps
- Run an HTTP Test on all the links inside each page to test their return status.
E.g.
$ python [login to view URL] [login to view URL]
Starting run for: [login to view URL]
Sitemap: [login to view URL]
3 Sitemaps Found:
- [login to view URL]
- [login to view URL]
- [login to view URL]
Testing:
/ -> OK
/contact-us -> OK
/our-team -> OK
/logon -> OK
/newpage -> ERRORS 404
STATUS 404: IMG : [login to view URL]
STATUS 404: LINK: [login to view URL]
/otherpage -> OK
Can you please provide estimates?
Hi sir,
I am scraping expert, I have did too many similar projects, please check my feedback then you will know.
Can you tell me more details? then I will provide demo data for you.
Thanks,
Kimi
I have a bachelor in Computer Science from the American University in Cairo and a minor in Mathematics, with 10+ years of experience with hands-on programming. I have worked for the past year in Microsoft's Advanced Technology Lab in Cairo (ATLC). I have a 2+ years of experience in web scraping with Python using BeautifulSoup, Requests and Selenium Webdriver. Check my previous projects for past feedback.
Dear Sir/ Madam,
Kindly check my bid & project completion ratio befor awarding.
I'm really interested to work on this project, I can start the work now , and can provide the best services from my end.
Please come on chat to discuss more about the project.
Thanks & Regards
Prog2U
ok. can implement this use lxml + requests lib. Also maybe multiple sitemap from file to check and log results to file will be more optimally. Contact me to start work.
Thu, 09 Jul 2015 16:38:10 +0000
Hello,
Can do a quick Perl script. May need to install a few modules, though.
Will follow all links starting from main page. Will detect copies of links and avoid third party links.
Can adap further if a sample available for testing.