RNA-seq is gradually becoming the dominating technique employed
to access the global gene expression in biological samples,
allowing more flexible protocols and robust analysis. However,
the nature of RNA-seq results im-pose new data-handling challenges
when it comes to computational analysis. With the increasing
employment of machine learning techniques in biomedical sciences,
databases that could provide curated datasets treated with
state-of-the-art approaches already adapted to machine learning
protocols become essential for testing new algorithms.
BARRA:CuRDa is composed of 17 handpicked RNA-seq datasets for Homo sapiens gathered from the Gene Expression Omnibus (GEO), using rigorous filtering criteria. All datasets were individually submitted to sample quality analysis, removal of low-quality bases, artifacts from the experimental process, removal of ribosomal RNA, and transcript level abundance. Moreover, like its sister database, all datasets were tested using analyses destined to provide a base knowledge of each dataset's characteristics, with the addition of new metrics.