Saturday, 28 September 2013

Visual Web Ripper: Using External Input Data Sources

Sometimes it is necessary to use external data sources to provide parameters for the scraping process. For example, you have a database with a bunch of ASINs and you need to scrape all product information for each one of them. As far as Visual Web Ripper is concerned, an input data source can be used to provide a list of input values to a data extraction project. A data extraction project will be run once for each row of input values.

An input data source is normally used in one of these scenarios:

    To provide a list of input values for a web form
    To provide a list of start URLs
    To provide input values for Fixed Value elements
    To provide input values for scripts

Visual Web Ripper supports the following input data sources:

    SQL Server Database
    MySQL Database
    OleDB Database
    CSV File
    Script (A script can be used to provide data from almost any data source)

To see it in action you can download a sample project that uses an input CSV file with Amazon ASIN codes to generate Amazon start URLs and extract some product data. Place both the project file and the input CSV file in the default Visual Web Ripper project folder (My Documents\Visual Web Ripper\Projects).

For further information please look at the manual topic, explaining how to use an input data source to generate start URLs.


Source: http://extract-web-data.com/visual-web-ripper-using-external-input-data-sources/

Wednesday, 25 September 2013

A simple way to turn a website into JSON

Recently, while surfing the web I stumbled upon an simple web scraping service named Web Scrape Master. It is a kind of RESTful web service that extracts data from a specified web site and returns it to you in JSON format.
How it works

Though I don’t know what this service may be useful for, I still like its simplicity: all you need to do is to make an HTTP GET request, passing all necessary parameters in the query string:
http://webscrapemaster.com/api/?url={url}&xpath={xpath}&attr={attr}&callback={callback}

    url  - the URL of the website you want to scrape
    xpath – xpath determining the data you need to extract
    attr - attribute the name you need to get the value of (optional)
    callback - JSON callback function (optional)

For example, for the following request to our testing ground:

http://webscrapemaster.com/api/?url=http://testing-ground.extract-web-data.com/blocks&xpath=//div[@id=case1]/div[1]/span[1]/div

You will get the following response:

[{"text":"<div class='name'>Dell Latitude D610-1.73 Laptop Wireless Computer</div>","attrs":{"class":"name"}}]
Visual Web Scraper

Also, this service offers a special visual tool for building such requests. All you need to do is to enter the URL of the website and click to the element you need to scrape:
Visual Web Scraper
Conclusion

Though I understand that the developer of this service is attempting to create a simple web scraping service, it is still hard to imagine where it can be useful. The task that the service does can be easily accomplished by means of any language.

Probably if you already have software receiving JSON from the web, and you want to feed it with data from some website, then you may find this service useful. The other possible application is to hide your IP when you do web scraping. If you have other ideas, it would be great if you shared them with us.



Source: http://extract-web-data.com/a-simple-way-to-turn-a-website-into-json/

Tuesday, 24 September 2013

Selenium IDE and Web Scraping

Selenium is a browser automation framework that includes IDE, Remote Control server and bindings of various flavors including Java, .Net, Ruby, Python and other. In this post we touch on the basic structure of the framework and its application to  Web Scraping.
What is Selenium IDE


Selenium IDE is an integrated development environment for Selenium scripts. It is implemented as a Firefox plugin, and it allows recording browsers’ interactions in order to edit them. This works well for software tests, composing and debugging. The Selenium Remote Control is a server specific for a particular environment; it causes custom scripts to be implemented for controlled browsers. Selenium deploys on Windows, Linux, and iOS. How various Selenium components are supported with major browsers read here.
What does Selenium do and Web Scraping

Basically Selenium automates browsers. This ability is no doubt to be applied to web scraping. Since browsers (and Selenium) support JavaScript, jQuery and other methods working with dynamic content why not use this mix for benefit in web scraping, rather than to try to catch Ajax events with plain code? The second reason for this kind of scrape automation is browser-fasion data access (though today this is emulated with most libraries).

Yes, Selenium works to automate browsers, but how to control Selenium from a custom script to automate a browser for web scraping? There are Selenium PHP and other language libraries (bindings) providing for scripts to call and use Selenium. It is possible to write Selenium clients (using the libraries) in almost any language we prefer, for example Perl, Python, Java, PHP etc. Those libraries (API), along with a server, the Java written server that invokes browsers for actions, constitute the Selenum RC (Remote Control). Remote Control automatically loads the Selenium Core into the browser to control it. For more details in Selenium components refer to here.



A tough scrape task for programmer

“…cURL is good, but it is very basic.  I need to handle everything manually; I am creating HTTP requests by hand.
This gets difficult – I need to do a lot of work to make sure that the requests that I send are exactly the same as the requests that a browser would
send, both for my sake and for the website’s sake. (For my sake
because I want to get the right data, and for the website’s sake
because I don’t want to cause error messages or other problems on their site because I sent a bad request that messed with their web application).  And if there is any important javascript, I need to imitate it with PHP.
It would be a great benefit to me to be able to control a browser like Firefox with my code. It would solve all my problems regarding the emulation of a real browser…
it seems that Selenium will allow me to do this…” -Ryan S

Yes, that’s what we will consider below.
Scrape with Selenium

In order to create scripts that interact with the Selenium Server (Selenium RC, Selenium Remote Webdriver) or create local Selenium WebDriver script, there is the need to make use of language-specific client drivers (also called Formatters, they are included in the selenium-ide-1.10.0.xpi package). The Selenium servers, drivers and bindings are available at Selenium download page.
The basic recipe for scrape with Selenium:

    Use Chrome or Firefox browsers
    Get Firebug or Chrome Dev Tools (Cntl+Shift+I) in action.
    Install requirements (Remote control or WebDriver, libraries and other)
    Selenium IDE : Record a ‘test’ run thru a site, adding some assertions.
    Export as a Python (other language) script.
    Edit it (loops, data extraction, db input/output)
    Run script for the Remote Control

The short intro Slides for the scraping of tough websites with Python & Selenium are here (as Google Docs slides) and here (Slide Share).
Selenium components for Firefox installation guide

For how to install the Selenium IDE to Firefox see  here starting at slide 21. The Selenium Core and Remote Control installation instructions are there too.
Extracting for dynamic content using jQuery/JavaScript with Selenium

One programmer is doing a similar thing …

1. launch a selenium RC (remote control) server
2. load a page
3. inject the jQuery script
4. select the interested contents using jQuery/JavaScript
5. send back to the PHP client using JSON.

He particularly finds it quite easy and convenient to use jQuery for
screen scraping, rather than using PHP/XPath.
Conclusion

The Selenium IDE is the popular tool for browser automation, mostly for its software testing application, yet also in that Web Scraping techniques for tough dynamic websites may be implemented with IDE along with the Selenium Remote Control server. These are the basic steps for it:

    Record the ‘test‘ browser behavior in IDE and export it as the custom programming language script
    Formatted language script runs on the Remote Control server that forces browser to send HTTP requests and then script catches the Ajax powered responses to extract content.

Selenium based Web Scraping is an easy task for small scale projects, but it consumes a lot of memory resources, since for each request it will launch a new browser instance.



Source: http://extract-web-data.com/selenium-ide-and-web-scraping/

Monday, 23 September 2013

The Increasing Significance of Data Entry Services

The instantaneous business environment has become extremely competitive in the new era of globalization. Huge business behemoths that had been benefited from monopolistic luxuries are now being challenged by newer participant in the marketplace, forcing recognized players to reorganize their plans and strategies. These are some of the major reasons that seemed to have forced businesses to opt for outsourcing services such as data entry services that allow them to focus on their core business processes. This in turn makes it simple for them to attain and maintain business competencies, a prerequisite for effectively overcoming the rising competitive challenges.

So, how exactly is data entry helping businesses in achieving their targeted goals and objectives? Well, to be able to know actually that, we will first have to delve deeper into the field of data entry and allied activities. To start with, it would be worth mentioning that every business, big and small, generates voluminous amounts of data and information that is important from a business point of view. This is exactly where the problems start to surface because accessing, analyzing and processing such voluminous amounts of data is too time consuming and obviously a task that can easily be classified as non-productive. And these are exactly the reasons for outsourcing such non-core work processes to third party outsourcing firms.

There is many data entry outsourcing firms and most of them are located in developing countries such as India. There are many reasons for such regional clustering, but the most prominent reason it seems is that India has a vast talent pool, comprising of educated, English-speaking professionals. The best part is that it is relatively less expensive to hire the services of these professionals. The same level of expertise will have been a lot more expensive to hire if it had been in a developed country. Subsequently, more and more businesses worldwide are outsourcing their non-core work processes.

As Globalization intensifies even more in the coming years, businesses will face even greater amounts of competitive pressures and it will just not be possible for them to even think about managing everything on their own, let alone actually going ahead and doing it. However, that should not be a problem, especially for businesses that opt for outsourcing services such as data entry and data conversion. By hiring such high-end and cost-effective services, these businesses will be able to realize the associated benefits that will come mostly as significant cost reductions, optimum accuracy, and increased efficiencies.

So for business executives that think outsourcing data entry related processes can help to achieve your targeted business goals and objectives, it's time you contacted an offshore outsourcing provider and request them precisely how they can ease your business. However just make sure that you opt for the most excellent available data entry services provider, perceptibly because it will be like sharing a part of your business.

Data Entry Services - PDF Conversion - Data Conversion. Data Entry Services for the most accurate handling and storage of critical data.




Source: http://ezinearticles.com/?The-Increasing-Significance-of-Data-Entry-Services&id=1125870

Friday, 20 September 2013

Data Entry Services Make Sense For Your Business Prospects

A business venture is the realization of an entrepreneur's dream and a lot of sincere effort and resources go into making it a success. Many different sections of an organization need to work in tandem to arrive at a single goal which will take the business forward. Besides marketing, human resource, finance, accounts and administration there is also the division handling data entry that contributes towards the profits which a business makes. Now, a business has multiple transactions every day and a record needs to be made of every transaction in an accurate manner in order to have a fair idea of the organization's day to day working and the future prospects. Data entry is an essential aspect of any and every business but it is also a time consuming task that requires the necessary expertise, skills and patience to maintain the records accurately. The data entry services provided by third party vendors are therefore, a highly beneficial service for the organizations across the globe.

The data entry services include the task of documentation, processing, conversion and filing of the regular data of any organization. A dedicated staff is assigned by the vendor providing data entry service to a client in order to maintain each and every data entry of the organization. The specific entries enable the professionals to maintain the financial status of a company through accurate records. The decision makers of the company can have easy and instant access to any such data as and when they require so that they can formulate the business plans and strategy after analyzing the current market position of their business.

Data entry services offered by professionals also come in handy when the company is ready to file its taxes or is facing a company audit. When the records are in appropriate order and easily accessible, it forms a favorable impression of the company in the minds of the auditors, creditors, buyers and the general public. A company that fairly declares its financial standing gains reliability and trust and maintaining data accurately is crucial to this exercise. Due to the multiple benefits of this kind of service of maintaining your company data by a third party, more and more vendors are entering this industry to provide such convenience to company's across the globe.

Any data of a company is crucial to its financial standing and hence is highly confidential. Any financial information, if leaked out to the competitors, could cause irrevocable damage to an organization. Hence, the security and confidentiality provided by the vendor offering data entry services is of prime importance to the organization. You must; therefore, be careful in your selection of the vendor providing such services to your company. The yellow pages or the internet are a good source to locate a reliable and reputable vendor and you could also go by the reference of other companies who have opted for the service of a particular vendor. Once you have such a vendor providing easy, accurate, confidential and economic services for data entry, you can indeed make a positive difference to your organization as a whole.




Source: http://ezinearticles.com/?Data-Entry-Services-Make-Sense-For-Your-Business-Prospects&id=1116082

Thursday, 19 September 2013

Outsource Data Mining Services to Offshore Data Entry Company

Companies in India offer complete solution services for all type of data mining services.

Data Mining Services and Web research services offered, help businesses get critical information for their analysis and marketing campaigns. As this process requires professionals with good knowledge in internet research or online research, customers can take advantage of outsourcing their Data Mining, Data extraction and Data Collection services to utilize resources at a very competitive price.

In the time of recession every company is very careful about cost. So companies are now trying to find ways to cut down cost and outsourcing is good option for reducing cost. It is essential for each size of business from small size to large size organization. Data entry is most famous work among all outsourcing work. To meet high quality and precise data entry demands most corporate firms prefer to outsource data entry services to offshore countries like India.

In India there are number of companies which offer high quality data entry work at cheapest rate. Outsourcing data mining work is the crucial requirement of all rapidly growing Companies who want to focus on their core areas and want to control their cost.

Why outsource your data entry requirements?

Easy and fast communication: Flexibility in communication method is provided where they will be ready to talk with you at your convenient time, as per demand of work dedicated resource or whole team will be assigned to drive the project.

Quality with high level of Accuracy: Experienced companies handling a variety of data-entry projects develop whole new type of quality process for maintaining best quality at work.

Turn Around Time: Capability to deliver fast turnaround time as per project requirements to meet up your project deadline, dedicated staff(s) can work 24/7 with high level of accuracy.

Affordable Rate: Services provided at affordable rates in the industry. For minimizing cost, customization of each and every aspect of the system is undertaken for efficiently handling work.

Outsourcing Service Providers are outsourcing companies providing business process outsourcing services specializing in data mining services and data entry services. Team of highly skilled and efficient people, with a singular focus on data processing, data mining and data entry outsourcing services catering to data entry projects of a varied nature and type.

Why outsource data mining services?

360 degree Data Processing Operations
Free Pilots Before You Hire
Years of Data Entry and Processing Experience
Domain Expertise in Multiple Industries
Best Outsourcing Prices in Industry
Highly Scalable Business Infrastructure
24X7 Round The Clock Services

The expertise management and teams have delivered millions of processed data and records to customers from USA, Canada, UK and other European Countries and Australia.

Outsourcing companies specialize in data entry operations and guarantee highest quality & on time delivery at the least expensive prices.



Source: http://ezinearticles.com/?Outsource-Data-Mining-Services-to-Offshore-Data-Entry-Company&id=4027029

Monday, 16 September 2013

Why Web Scraping Software Won't Help

How to get continuous stream of data from these websites without getting stopped? Scraping logic depends upon the HTML sent out by the web server on page requests, if anything changes in the output, its most likely going to break your scraper setup.

If you are running a website which depends upon getting continuous updated data from some websites, it can be dangerous to reply on just a software.

Some of the challenges you should think:

1. Web masters keep changing their websites to be more user friendly and look better, in turn it breaks the delicate scraper data extraction logic.

2. IP address block: If you continuously keep scraping from a website from your office, your IP is going to get blocked by the "security guards" one day.

3. Websites are increasingly using better ways to send data, Ajax, client side web service calls etc. Making it increasingly harder to scrap data off from these websites. Unless you are an expert in programing, you will not be able to get the data out.

4. Think of a situation, where your newly setup website has started flourishing and suddenly the dream data feed that you used to get stops. In today's society of abundant resources, your users will switch to a service which is still serving them fresh data.

Getting over these challenges

Let experts help you, people who have been in this business for a long time and have been serving clients day in and out. They run their own servers which are there just to do one job, extract data. IP blocking is no issue for them as they can switch servers in minutes and get the scraping exercise back on track. Try this service and you will see what I mean here.


Source: http://ezinearticles.com/?Why-Web-Scraping-Software-Wont-Help&id=4550594