Background
The AlphaFoldDB Structure Extractor (https://project.iith.ac.in/sharmaglab/alphafoldextractor/) is an open-access web server and API toolkit designed to facilitate the bulk download of predicted protein structures from the AlphaFold Database using well-known accession formats. Addressing the current limitations in extracting structures beyond a restricted list of model organisms and a threshold number, this tool accepts diverse sequence and structure input identifiers, such as NCBI Taxonomy ID, RefSeq accessions, locus tags (old and new), and UniProt or AlphaFold accessions for structure retrieval.
Results
Users can download structure files in PDB, mmCIF, bCIF, or/and PAE JSON formats using any of the above-mentioned input accessions as input. The tool also generates an accompanying ID mapping file to trace input identifiers back to standard accession numbers and reports unmapped IDs separately. Users can also perform just the ID mapping in case they do not require the structure coordinate files. An API methodology is also provided for programmatic access, enabling integration into bioinformatics pipelines. We have tested the tool using several randomly selected accessions (individual inputs and up to 5000 input accessions) of each type from NCBI RefSeq and Taxonomy Databases, UniProt Database and AlphaFold Database.
Conclusions
Overall, AlphaFoldDB Structure Extractor streamlines the structure procurement process from AlphaFold database, empowering researchers in structural and functional genomics with minimal computational expertise.