Raw data set

12/7/2023

The Trending YouTube Video Statistics is a daily record with daily statistics for trending Youtube videos which were collected using YouTube API. The dataset can be used for time-series analysis project. The size of the data is 7 MB, and it has 5 columns with 97605 rows. The Temperature Readings: IoT Devices dataset contains the temperature readings from IoT devices installed outside and inside of an anonymous room. Each data sample corresponds to one completed trip and contains a total of nine features. Each ride has been categorised into three sub-categories which are taxi central based, stand-based and non-taxi central based. The Taxi Trajectory dataset provides a complete year (from to ) of the trajectories for all the 442 taxis running in the city of Porto, Portugal. The data has been acquired from, which contains more than 1000 pairs of “company, slogan” spread across 10+ categories. It includes a list of slogans in the form of company_name, company_slogan. The Slogan dataset can be used to analyse slogans of various organisations. This dataset describes the listing activity and metrics in NYC, NY, for 2019. It includes all needed information to find out more about hosts, geographical availability, necessary metrics to make predictions and draw conclusions. The New York City Airbnb Open Data is a public dataset and a part of Airbnb. The columns of this dataset include Id, Sepallength, PetalLength, etc. One class is linearly separable from the other two, and the latter are not linearly separable from each other. The Iris Species is the Iris Plant Database, which contains three classes of 50 instances each, where each class refers to a type of iris plant. This dataset is ideal for anyone looking to practice their exploratory data analysis (EDA) or get started in building predictive models. It includes information such as booking time, length of stay, number of adults, children/babies, number of available parking spaces, among other things. The Hotel Booking demand dataset contains booking information for a city hotel and a resort hotel. The contents of the dataset include instant air temperature, relative humidity of the air, instant dew point, solar radiation, among others. The Hourly Weather Surface – Brazil (Southeast region) covers hourly weather data from 122 weather stations of the southeast region (Brazil).The size of the dataset is 2 GB, and there are 17 climate parameters (continuous values) from 122 weather stations. 3| Hourly Weather Surface – Brazil (Southeast region) In this dataset, the items are words extracted from the Google Books corpus. Google Books Ngrams is a dataset containing Google Books n-gram corpora.

The dataset can be used in natural language processing (NLP) projects. For all crawls since 2013, the data has been stored in the WARC file format and also contains metadata (WAT) and text data (WET) extracts. 1| Common Crawl CorpusĬommon Crawl is a corpus of web crawl data composed of over 25 billion web pages. Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

0 Comments

Raw data set

Leave a Reply.

Author

Archives

Categories