{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting Fasta from GFF" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Tool: bedtools getfasta (aka fastaFromBed)\n", "Version: v2.30.0\n", "Summary: Extract DNA sequences from a fasta file based on feature coordinates.\n", "\n", "Usage: bedtools getfasta [OPTIONS] -fi -bed \n", "\n", "Options: \n", "\t-fi\t\tInput FASTA file\n", "\t-fo\t\tOutput file (opt., default is STDOUT\n", "\t-bed\t\tBED/GFF/VCF file of ranges to extract from -fi\n", "\t-name\t\tUse the name field and coordinates for the FASTA header\n", "\t-name+\t\t(deprecated) Use the name field and coordinates for the FASTA header\n", "\t-nameOnly\tUse the name field for the FASTA header\n", "\t-split\t\tGiven BED12 fmt., extract and concatenate the sequences\n", "\t\t\tfrom the BED \"blocks\" (e.g., exons)\n", "\t-tab\t\tWrite output in TAB delimited format.\n", "\t-bedOut\t\tReport extract sequences in a tab-delimited BED format instead of in FASTA format.\n", "\t\t\t- Default is FASTA format.\n", "\t-s\t\tForce strandedness. If the feature occupies the antisense,\n", "\t\t\tstrand, the sequence will be reverse complemented.\n", "\t\t\t- By default, strand information is ignored.\n", "\t-fullHeader\tUse full fasta header.\n", "\t\t\t- By default, only the word before the first space or tab \n", "\t\t\tis used.\n", "\t-rna\tThe FASTA is RNA not DNA. Reverse complementation handled accordingly.\n", "\n" ] } ], "source": [ "!/Applications/bioinfo/bedtools2/bin/fastaFromBed" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ">scaffold22 dna_sm:supercontig supercontig:oyster_v9:scaffold22:1:1964558:1 REF\n", "TATTTGCAATTAGCCAGTTGAAATATATTATACATTAATCTCCATTATTCTTATCTATAA\n", "ATCATATTCGTTCGTAAAATGTAAGTATCAAACGATTCATTTTTGAAATCATAAGTGATT\n", "TTATGACTGTTAATCTTCCTTATATTGTTAATTTACGTAGTTTCCACTGCTTTATTAttt\n", "ttttttATATCGTCTAGTATAACATTGTAATTTCAGTTTAACATATCATATCAGTGTACC\n", "GAAAATTTTATCCCTAGCAAGTCAAAGTTTGCATTGTTCCAATCCAGTTTCCATCTGGAG\n", "TGATGAAAAGTTTCTTTTGAGAATTTTTACTTCCTATCCACACCATCTTGGTCTCATTGT\n", "TACAAACCACGGATTTGAGAATCGATGTGAATCACAAGTGGGTCACCGATGATGTCTTGG\n", "CCTTAGAAAAGTTAATATTTAGTCCTGATATATCAGCAAAAAAATCTATTTCCCTAAGAA\n", "TACCGTCTTAAGTTTCTGGTGGTCCATTTCTAAATAAAGATGTGTCATCTGCATATTTGT\n" ] } ], "source": [ "!head data/Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "!/Applications/bioinfo/bedtools2/bin/fastaFromBed \\\n", "-fi data/Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa \\\n", "-bed data/oyster-EE2-DMR.gff \\\n", "-fullHeader \\\n", "-fo analyses/ee2-old.fasta" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ">scaffold1017:117843-118366\n", "CCATTGAGCTATTGTGACCGAATATATTGACCAACAGTCGCACACCTGTTAACCCATTGACAAAGTTTAAACGATAGTCTGCTTGATGAATGGTTAAGGGTTGGTGGGTGTTATATCCACCGCAATGATATTAAAATAGTCGTAGATCTTATAAAATGGCTTAGATATCCTTTCTTGTTCTGTTAAGAGTCATCAAATCATCGAGTGGTCTTCTGGACAACCACTAAGATCATTCAATCTTCAATAGCCAAACCAATCTTTACAATAATGCAAAGTGTAGAGAGAGCTGACAATGTCCTATTGAGTCATAATCTGTGACGTCTGAATTAACTTTAAAATTTGTATGAAGATAAATTTATCAACAAATCGCATTTGTTAAAAAGTCAGCAGTAACTTTGTTTAACAAGAGGCCCACAGGCCTTGACAGTCACCTGAGTTACAAGAGTCGGAGGAATCTCAGATTATTTAAGATATATTGAACTTGCCATCTTAATGGTTGAATGAAAGTTCATTCTGAATTCAT\n", ">scaffold1017:120082-120710\n", "TGTCTCAAAATGCTACTATTATACATACTGTATGTGTAAGGGAAATGTTGAATTTTTTTAAAAATTGTGTACCGGTATATGAAACACGGATGCCCCATGGAAACCTACTGCATATGACTTTAGATATAAGGCGTGTTTTCCAAAAACAAAACATACGAACCCAAGTTAACAAAATATTGATCAAGTTATTATTGCGACTTTGTTCAAATTTTAATCATCCACTGACTTTAAATGAAGTATAGACAGAATATAATTACAGATAATCAGGCAGGCAATATAAAAGTTGCAAGGTTTACGCTGAAAATTGAGTACTTAATTGCACTATACATGCAGATTATTGTGTGAAGCCTAAAAGGCAATTCAATGTGTTTACATACAAACAATAGGATAATCTTAACTAAGACCTATGCTTTCAGTTTGCAAAACTCATGTGCAAAACTCTGTGTATTCATTTGCATTTTTGCTGAAAATGCGCTTTCATCTTACTCTTAAAATAGGAATGATTCAAATTAATTTGCATTTGAATAATAATGCATTGATAACTGGTGCACTTTTAAGCTTAGCCAAAACATGATACCTCAATATAAGTTAAACTATAGGCAATATGTTTTGTGTCAAGCAGTAAAAT\n", ">scaffold1409:145388-145656\n", "AATAGAATTGTGGTGTGAGGAAATTTTGAGAAAACGTTATGCAATATTTACCATAAAAAGTAGATGAAACCCGAAATTCCCCAAAAAGCCCCTTATCTGTaaaaaaaaaTTGTCGGCCGCTGCTTAATTTTGAAAACGAGTCAGAAGCTTTACATTATTTATAACGGGTATTTCTACGGCCTGACTTTATTTACGATGAAAACCTTAAAACAGTTAATGATATATTGTTTGTTTTAATTTTTTCTGGTACGAAAATAAAAGTTTTATA\n", ">scaffold146:686525-687786\n", "ATAAATCTAGTAAGTTACTTATCAAAAGTTTCTAAATCTAATAAACAAATCATTTTCCAAATAAATCTTCCCTCCATTTATTAttttttttAATGAAATCCCATCCAGTTTACCTGGAGCAAGGACACCCTTTCTTCCTCCAGTTTTGATTTTTCCTGGAAGATGAATCTGTCCAGCAGAGTTTCCTCGATGTCCGCGCGGGAGCGCTGGAAGTAGATGACGGAGGTGTACGCCGCAAGATGTCGTGGTACAATGTGTGGCTCCACCGTGGTGTGCAGGAACAGGCGGAAACGTGGATCACACTCCACTTCATGATCTTCCACCTAAAACAGAATAAATAATGGAAAAATTCAATCATTCTCATTAATATTTTACTTTGGCCAAAAATTACTCCATAttttttttttaaatctttttttttttCGGAGTCCATTATATTTTTAAAATTAAAAACTTTGACTAAAAGGTAGGATTTGTTCATGTTAGATAAGGATAATGACAAAAAACCCAAAAATAATTAAAACGCATTTTATAGTTTAACATGGATGAGCAGATTTCCTCCATAATTATGTACATGTTGTGAATATAATACTTGTCTGATGGACTTACAGTGATCTTGAACTTGAGTTTGCCGTTGATGAACTGTTTACAGCTCTGGATGGCGTCGCGGAACCGCTTGTCCTTGGCCAGCTGTTTGGTGTCGCAGTCGGTCACCAGTAGGGGGCAGCCCTCCGCTAGACAGTTCTCAAACTGAGACCGAATCTCCTACAAACACAGAGTAACAACAGATAATGACCCCTTAAACTGAGATGGAATCTACTACATGTACAATCACAAGGCTCCTCTATGTTACATCAAGGTCTTAAGAGTTACAGATCATGAATCTAAAATTTCAACAAATATTGTGACTTTTTAATCAATAAAGCAAATCAATATTTGAATATCAGGAATATAGGCAGCTCAATAAATTCTTCCCTTTTGGCCAGAGAAACACTGATCACTTTTCATACTTACCAGTACAAACTTCTAGAACTAATTACTACATTTACATGTATAACTGTACACTTATTTCAACTTTTCTGGATCTAGATCCCCTTGTGTACTTACAGAGTACTTGACCTCCACAAGTCCGGTGTTTCCGTAGAAGTTCCGAAGCCAGTCCAGGACGCGACTGGTGGGGTCACAGATGAGGGGCCAAGCCACCATGCTGTCCCGCTGCATCAGGAAGCAGGCGTTCTCCAGCATCAGCTTAGTGATGGAC\n", ">scaffold1532:587964-588458\n", "TTAATGGCATAATTTTTTGGGACTTATGAAATTTtatatatatatataattatatatatCCAGAAAATAACATATTCATCGATCTTACTAAGGAGGGAAATTATGTTCCTGCTAGCATCAGATATAATTTCCCAGACTATCAGGGGAGACTGGTGCACTAAATATAGCAGCCCTCTCTCAATTAGAAGTCTCTCCATTTTGGGGCTATGGCGTCCAACTTCTCTTGAATCCCCAACCTCTACTTGCAAAATGGCTCGTGTGTTGGTGTCCATAAATACTCCATTTGATCGGTTGGCAGAAAACCCTTTAAAATAAAGTCATAAAATGTCCATCAATATCTTACATTATTACTTTTGCAAATATCTTTCTGTACTTGATTCAATCTACCTGAAAATAAATTTTCCTTTGTTTATTACAGTAATTTTATCAGAAGTGACTTAATTTGCATAAGTACATGTATATATACACTTATCATGCATATAAAATGTGTATTA\n" ] } ], "source": [ "!head analyses/ee2-old.fasta" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }