DV -paper


COVID-19 Open Research Dataset Challenge (CORD-19)

An AI challenge with AI2, CZI, MSR, Georgetown, NIH & The White House


Dataset Description

In response to the COVID-19 pandemic, the White House and a coalition of leading research groups have prepared the COVID-19 Open Research Dataset (CORD-19). CORD-19 is a resource of over 44,000 scholarly articles, including over 29,000 with full text, about COVID-19, SARS-CoV-2, and related corona viruses. This freely available dataset is provided to the global research community to apply recent advances in natural language processing and other AI techniques to generate new insights in support of the ongoing fight against this infectious disease. There is a growing urgency for these approaches because of the rapid acceleration in new coronavirus literature, making it difficult for the medical research community to keep up.

Call to Action

We are issuing a call to action to the world’s artificial intelligence experts to develop text and data mining tools that can help the medical community develop answers to high priority scientific questions. The CORD-19 dataset represents the most extensive machine-readable coronavirus literature collection available for data mining to date. This allows the worldwide AI research community the opportunity to apply text and data mining approaches to find answers to questions within, and connect insights across, this content in support of the ongoing COVID-19 response efforts worldwide. There is a growing urgency for these approaches because of the rapid increase in coronavirus literature, making it difficult for the medical community to keep up.

A list of our initial key questions can be found under the Tasks section of this dataset. These key scientific questions are drawn from the NASEM’s SCIED (National Academies of Sciences, Engineering, and Medicine’s Standing Committee on Emerging Infectious Diseases and 21st Century Health Threats) research topics and the World Health Organization’s R&D Blueprint for COVID-19.

Many of these questions are suitable for text mining, and we encourage researchers to develop text mining tools to provide insights on these questions.

In this project, you will follow your own interests to create a portfolio worthy single-frame viz or multi-frame data story that will be shared in your presentation. You will use all the skills taught in this course to complete this project step-by-step, with guidance from your instructors along the way. You will first create a project proposal to identify your goals for the project, including the question you wish to answer or explore with data. You will then find data that will provide the information you are seeking. You will then import that data into Tableau and prepare it for analysis. Next you will create a dashboard that will allow you to explore the data in depth and identify meaningful insights. You will then give structure to your data story by writing the story arc in narrative form. Finally, you will consult your design checklist to craft the final viz or data story in Tableau. This is your opportunity to show the world what you’re capable of – so think big, and have confidence in your skills!

Kaggle Website:


Assignment Length (word count): 10-15 pages.

References: At least 10 peer-reviewed, scholarly journal references.

Specific TASK:


Create summary tables that address relevant factors related to COVID-19

COVID-19 Open Research Dataset Challenge (CORD-19) Round #2

Paul Mooney · 3 Submissions · 9 days to go


Task Details

Create summary tables that address relevant factors related to COVID-19

Specifically, we want to know what the literature reports about:

  • Effectiveness of case isolation/isolation of exposed individuals (i.e. quarantine)
  • Effectiveness of community contact reduction
  • Effectiveness of inter/inner travel restriction
  • Effectiveness of school distancing
  • Effectiveness of workplace distancing
  • Effectiveness of a multifactorial strategy prevent secondary transmission
  • Seasonality of transmission
  • How does temperature and humidity affect the transmission of 2019-nCoV?
  • Significant changes in transmissibility in changing seasons?
  • Effectiveness of personal protective equipment (PPE)

And we also want to know what the literature reports about the following questions that we added only very recently:

  • This is where you will find new questions, if applicable

Expected Submission

Article summary tables. Specifically, the expected submission will be one or more .CSV files saved to the output folder of a Kaggle Notebook. The .CSV files should contain summary tables that follow the table formatting that is both described and demonstrated in the folder that is titled target_tables. For this specific task, article summary tables should follow the table formatting: Group 2 - Relevant Factors.

Date Study Study Link Journal Study Type Factors Influential Excerpt Measure of Evidence Added on DOI CORD_UID

There should be one .CSV file per target table (and one or more .CSV file per notebook), and the title of the .CSV files should be the same as the titles of the target tables. It may be advantageous to extract larger excerpts instead of specific values and, likewise, it may be advantageous for values to be prefixed by a human-readable indication of the location of that same value within the full-text document (in square brackets [] to facilitate error-checking).

To be valid, a submission must be contained in one or more Kaggle notebooks made public on or before the submission deadline. Participants are free to use additional datasets in addition to the official Kaggle dataset, but those datasets must also be publicly available in order for the submission to be valid. Previous versions of the article summary tables can be found in the folder that is titled target_tables.

An ideal submission will be able to: (1) recreate the target tables; (2) create new summary tables; and/or (3) append new rows to the old tables in order to add: (A) newly published articles; or (B) previously overlooked articles. Notebook authors should attempt to minimize the number of missing values within each new row while also attempting to avoid any errors or inaccuracies. See the Kaggle COVID-19 Contributions page for an example of what an article summary table might look like.

Data Set :


Leave a Comment