RNA-seq is gradually becoming the dominating technique employed
to access the global gene expression in biological samples,
allowing more flexible protocols and robust analysis. However,
the nature of RNA-seq results im-pose new data-handling challenges
when it comes to computational analysis. With the increasing
employment of machine learning techniques in biomedical sciences,
databases that could provide curated datasets treated with
state-of-the-art approaches already adapted to machine learning
protocols become essential for testing new algorithms.
BARRA:CuRDa is composed of 17 handpicked RNA-seq datasets for Homo
sapiens gathered from the Gene Expression Omnibus (GEO), using
rigorous filtering criteria. All datasets were individually
submitted to sample quality analysis, removal of low-quality
bases, artifacts from the experimental process, removal of ribosomal
RNA, and transcript level abundance. Moreover, like its sister
database, all datasets were tested using analyses destined to
provide a base knowledge of each dataset's characteristics, with
the addition of new metrics.

BARRA:CuRDa - a Curated RNA-seq Database for Cancer Research
How to Cite
If you use BARRA:CuRDa in a scientific publication, we would appreciate citations to the following paper:
-
Feltes, B.C.; Poloni, J.F.; Dorn, M. Benchmarking and Testing Machine Learning Approaches
with BARRA:CuRDa, a Curated RNA-Seq Database for Cancer Research.
Journal of Computational Biology. 2021 Sep; 28(9), 931-944.
BibTeX@article{feltes:2021, author = {Feltes, Bruno Cesar and Poloni, Joice De Faria and Dorn, Marcio}, doi = {10.1089/cmb.2020.0463}, journal = {Journal of Computational Biology}, number = {9}, pages = {931--944}, title = {{Benchmarking and Testing Machine Learning Approaches with BARRA:CuRDa, a Curated RNA-Seq Database for Cancer Research}}, url = {https://doi.org/10.1089/cmb.2020.0463}, volume = {28}, year = {2021} }
Workflow
Dataset

GSE48850
- TOP 10 Genes
ENSG00000259803 - SLC22A31
ENSG00000223914 - LINC02471
ENSG00000187122 - SLIT1
ENSG00000162873 - KLHDC8A
ENSG00000163898 - LIPH
ENSG00000147256 - ARHGAP36
ENSG00000034971 - MYOC
ENSG00000145864 - GABRB2
ENSG00000162366 - PDZK1IP1
ENSG00000260943 - LINC02555
Download Heatmap
(PNG)

GSE63511
- TOP 10 Genes
ENSG00000174460 - ZCCHC12
ENSG00000125931 - CITED1
ENSG00000163898 - LIPH
ENSG00000259803 - SLC22A31
ENSG00000147256 - ARHGAP36
ENSG00000229119 - AC026403.1
ENSG00000149948 - HMGA2
ENSG00000260943 - LINC02555
ENSG00000223914 - LINC02471
ENSG00000187122 - SLIT1
Download Heatmap
(PNG)

GSE68799
- TOP 10 Genes
ENSG00000127074 - RGS13
ENSG00000164500 - SPATA48
ENSG00000213231 - TCL1B
ENSG00000224187 - LINC01991
ENSG00000167483 - NIBAN3
ENSG00000248302 - BNIP3P41
ENSG00000111732 - AICDA
ENSG00000257275 - AL139020.1
ENSG00000260303 - AC108206.2
ENSG00000166736 - HTR3A
Download Heatmap
(PNG)

GSE71651
- TOP 10 Genes
ENSG00000105695 - MAG
ENSG00000141668 - CBLN2
ENSG00000254732 - AP001931.1
ENSG00000170373 - CST1
ENSG00000122584 - NXPH1
ENSG00000248710 - AC079594.2
ENSG00000263244 - RPPH1
ENSG00000067048 - AC087190.3
ENSG00000280893 - AC009133.6
ENSG00000227617 - CERS6-AS1
Download Heatmap
(PNG)